Determining if two file paths point to same file under Linux / C?

Determining if two file paths point to same file under Linux / C? - c

Under Linux, I have two file paths A and B:
const char* A = ...;
const char* B = ...;
I now want to determine, should I open(2) them both...
int fda = open(A, ...);
int fdb = open(B, ...);
...will I get two filehandles open to the same file in the filesystem?
To determine this I thought of stat(2):
struct stat
{
dev_t st_dev;
ino_t st_ino;
...
}
Something like (pseudo-code):
bool IsSameFile(const char* sA, const char* sB)
{
stat A = stat(sA);
stat B = stat(sB);
return A.st_dev == B.st_dev && A.st_ino == B.st_ino;
}
Are there any cases where A and B are the same file but IsSameFile would return false?
Are there any cases where A and B are different files but IsSameFile would return true?
Is there a better way to do what I'm trying to do?

Your program will work fine in all the cases because A.st_ino will return the inode number of the files in your system. Since inode number is unique your program will correctly identify whether the two files opened are same or not.
You can also check the value of A.st_mode to find out whether the file is a symbolic link.

It depends on why exactly you want to avoid opening the same file twice. Your solution is usually the correct one, but there are some situations where files should be considered the same if they have the same absolute path but not if they are links to the same inode. In that case you need to convert the paths to absolute paths and compare them ... see Getting absolute path of a file
You also need to decide whether you consider a symlink to a file equivalent to the file or another symlink to it. For inode equivalence, that determines whether to use stat or lstat. For path equivalence, it determines whether you can use realpath or if you need to get the absolute path without following symlinks.

Related

How do you check to see if two different file reference "strings" refer to the same file?

We have a function in straight C that is intended to tell us if two file references refer to the same file. Right now it first checks to see if the passed in parameters are actually the same pointer, and if not, it looks at the characters to see if the strings of characters are the same. But that doesn't account for the possibility of different ways to refer to the same file. For example...
"/d1/d2/d3/theFile" compared to "../d3/theFile"
Those could be the same file or not, depending on the directory structure and where the current point of reference is. I'd like to improve the following function to be able to check to see if the string reference to a file refers to the same file as another string...
static bool is_same_file (const char *f1, const char *f2) {
if (f1==f2)
return true;
return (0==strcmp(f1, f2));
}
I imagine that it might be possible to try opening both files and use that in some way. But I don't know how to check to see if the opened files are the same physical file on the drive, and not just coincidentally the same file that happen to exist in different directories. In C#, there's a FileInfo class that can be used to find a file based on that reference string, and then you can compare complete directory information, file name, and so on. Is there a way to do something similar in C?

On a POSIX system you can use stat() to check if they have the same inode number on the same filesystem.
You should also check the generation number, to handle a race condition due to the file being deleted and a new file getting its inode number between the two calls.
#include <sys/stat.h>
static bool is_same_file (const char *f1, const char *f2) {
struct stat s1, s2;
if (stat(f1, &s1) < 0)) {
perror("stat f1");
return false;
}
if (stat(f2, &s2) < 0)) {
perror("stat f2");
return false;
}
return s1.st_dev == s2.st_dev && s1.st_ino == st.st_ino && s1.st_gen == s2.st_gen;
}

How to delete a file in C using a file-descriptor?

In my code, I create a file with a random name using mkstemp() function (Im on Linux). What this function returns is an int being a file descriptor.
int fd;
char temp[] = "tempXXXXXX";
fd = mkstemp(temp);
Later I can access the file using fdopen() through that int file descriptor.
FILE *file_ptr = NULL;
file_ptr = fdopen(fd);
But at the end of my program, I would like to see if the file still exists with the random name it was given when I created it (the program should change that file name if successful). I can set a flag if the rename() function run on that file is successful, but I still don't know how to delete it when I only have its file descriptor.
if rename files => remove the temp file
How can I do that? Or is there a way to get the files name if I have its file descriptor?

Neither C nor POSIX (since you are using POSIX library functions) defines a way to delete a file via an open file descriptor. And that makes sense, because the kind of deletion you're talking about is actually to remove a directory entry, not the file itself. The same file can be hard linked into the directory tree in multiple places, with multiple names. The OS takes care of removing its data from storage, or at least marking it as available for reuse, after the last hard link to it is removed from the directory tree and no process any longer has it open.
A file descriptor is associated directly with a file, not with any particular path, notwithstanding the fact that under many circumstances, you obtain one via a path. This has several consequences, among them that once a process opens a file, that file cannot be pulled out from under it by manipulating the directory tree. And that is the basis for one of the standard approaches to your problem: unlink (delete) it immediately after opening it, before losing its name. Example:
#include <stdlib.h>
#include <unistd.h>
int make_temp_file() {
char filename[] = "my_temp_file_XXXXXX";
int fd;
fd = mkstemp(filename);
if (fd == -1) {
// handle failure to open ...
} else {
// file successfully opened, now unlink it
int result = unlink(filename);
// ... check for and handle error conditions ...
}
return fd;
}
Not only does that (nearly) ensure that the temp file does not outlive the need for it, but it also prevents the contents from being accessible to users and processes to which the owning process does not explicitly grant access.

Even though this doesn't exactly answer the question you're asking about mkstemp, consider creating a temporary file that will automatically be deleted, unless you rename it.
Instead of mkstemp you could call open combined with the creation flag O_TMPFILE to create a temporary, unnamed file that is automatically deleted when file is closed.
See open(2):
O_TMPFILE (since Linux 3.11)
Create an unnamed temporary regular file. The pathname argu‐
ment specifies a directory; an unnamed inode will be created
in that directory's filesystem. Anything written to the
resulting file will be lost when the last file descriptor is
closed, unless the file is given a name.
Instead of a filename, you call open with the path where you prefer to place the temporary file, like:
temp_fd = open("/path/to/dir", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
If you like to give the temporary file a permanent location/name, you can call linkat on it later:
linkat(temp_fd, NULL, AT_FDCWD, "/path/for/file", AT_EMPTY_PATH);
Note: Filesystem support is required for O_TMPFILE, but mainstream Linux filesystems do support it.

readlink provide you the name of your file depending of the file descriptor if you use the path /proc/self/fd/ adding you fd.
Then use remove for deleting the file passing the name readlink gave you
ssize_t readlink(const char *path, char *buf, size_t bufsiz); (also load ernno)
int remove(const char *filename); (returns zero is successful, otherwise nonzero)
I hope something like that could helped you ?
⚠ Don't copy/past this you must edit "filename"; _BUFFER, _BUFSIZE ⚠
#include<stdio.h>
#include <unistd.h>
#include <stdlib.h>
int delete_file(int fd) {
char *str_fd = itoa(fd, str_fd, 10);
char *path = strcat("/proc/self/fd/", str_fd);
if (read_link(path, buffer, bufsize) == -1)
return -1;
int del = remove(filename);
if (!del)
printf("The file is Deleted successfully");
else
printf("The file is not Deleted");
return 0;
}
(feel free to edit this, i didn't test the code and i let you handel the buffer and buffer size)

Why doesn't readdir () system call work the way it should (unexpected output)?

I am writing a C program like,
void printdir (char*);
int main () {
printf ("Directory scan of /home: \n");
printdir ("/home/fahad/");
exit (0);
}
void printdir (char *dir) {
struct dirent *entry;
DIR *dp = opendir (dir);
if (dp == NULL) {
fprintf (stderr, "Cannot open dir:%s\n", dir);
return;
}
chdir (dir);
while ((entry = readdir(dp)) != NULL)
printf ("%s\n",entry -> d_name);
closedir (dp);
}
Interestingly, it shows output in an unexpected way.
Considering the fact that whenever a directory is created in UNIX. First two entries are created inside this directory one is . and other is ... So basically their inode numbers should be less than the directory entries created through mkdir () or open () (for directory and file respectively).
My question is, in what order readdir () system call reads the directory entries? Because I don't get first who entries . and ...
Why is that so?

Try skipping the "." and ".." entries, as follows:
DIR* dirp;
struct dirent *dp=NULL;
char* fname;
if( !(dirp=opendir(dname)) ) {
int ec=errno;
printf("completed:-1:cannot opendir %s (%d)\n",dname,ec);
return(-1);
}
while ((dp = readdir(dirp)) != NULL) {
if( strcmp(dp->d_name,".")==0 ) continue;
if( strcmp(dp->d_name,"..")==0 ) continue;
fname=dp->d_name;
sprintf(pathname,"%s/%s",dname,fname);
}
See this answer which notes that since the order is not stated as predictable, one should not assume any order. The above code will gives a sample of how to handle (avoid) these entries (in the typical use-case of traversing a directory hierarchy). The order is probably based upon the order of the files appearing in the directory inodes.

readdir() doesn't return entries in any particular order. As others mentioned, the order will depend on the particular file system in question.
For example, the Berkeley UFS file system uses an unsorted linked-list. See the description of the direct structure on page 744 of http://ptgmedia.pearsoncmg.com/images/0131482092/samplechapter/mcdougall_ch15.pdf. The binary content of a directory consists of a stream of variable-length records, each of which contains the inode number, record length, string length (of the filename) and the string data itself. readdir() works by walking the linked list (using the record length to know where each record begins relative to the previous record) and returning whatever it finds.
The list of records is not typically optimized, so filenames appear on the list (more or less) in the order the files were created. But not quite, because holes (resulting from deleted files) will be filled with new filenames if they are small enough to fit.
Now, not all file systems represent directories the way UFS does. A file system that keeps directory data in a binary tree may choose to implement readdir() as an in-order traversal of that tree, which would present files sorted by whatever attributes it uses as key for the tree. Or it might use a pre-order traversal, which would not return the records in a sorted order.
Because applications can not know the nature of the file system's implementation (and that each mounted volume can potentially use a different file system), applications should never assume anything about the order of entries that readdir() returns. If they require the entries to be sorted, they must read the entire directory into memory and do their own sorting.
This is why, for example, the ls command can take a long time to display output when run against a large directory. It needs to sort the entire list of names (and determine the longest name, in order to compute the column width) before it can display any output. This is also why ls -1U (disable sorting and display in one column) will produce output immediately on such directories.

First two entries are created inside this directory one is . and other is ... So basically their inode numbers should be less than the directory entries created through mkdir () or open ()(for directory and file respectively).
Yes, your understanding about the inode numbers is correct. To validate this we can write
simple c++ program to store the inode/name in map.
std::map<ino_t, std::string> entries;
std::pair<ino_t, std::string> en;
while ((entry = readdir(dp)) != NULL) {
en.first = entry->d_ino;
en.second = entry->d_name;
entries.insert(en);
printf ("%s\n",entry -> d_name);
}
"entries in GDB"
================
[5114862] = "..",
[5114987] = ".",
[5115243] = "taop",
[5115623] = "c++11_study",
[5115651] = "volume-3",
[5115884] = "gtkmm",
[5116513] = "basic",
[5116733] = "program",
[5116794] = "bakwas",
[5116813] = "a.out",
[5116818] = "foo",
This way we can validate about the order of inode number and "." & ".." are the less than
other directory & file entry.
My question is, in what order readdir () system call reads the directory entries? Because I don't get first who entries . and ... Why is that so?
From The Book "Advanced Programming in the UNIX® Environment by W. Richard Stevens",
we can get the following:
The opendir function initializes things so that the first readdir reads the first entry in the directory. The ordering of entries within the directory is implementation dependent and is usually not alphabetical. So their order are not defined and for the above program, readdir() gave in the following order.
Output from readdir()
=====================
c++11_study
taop
volume-3
basic
.
gtkmm
foo
program
a.out
..
bakwas

C++ / C: Move Directory to Another Location

I want to move the contents of one directory to another. I specify the source and destination directories via command line arguments. Here's the code:
#include <stdlib.h>
#include <stdio.h>
void move_dir(FILE *src, FILE *dest) {
int c = getc(src);
while(getc(src)!=EOF) {
putc(c,dest);
}
}
int main(int argc, char* argv[])
{
FILE *src=fopen(argv[1]);
FILE *dest=fopen(argv[2]);
while(--argc>0) {
if(src!=NULL && dest!=NULL) {
move_dir(src,dest);
}
}
fclose(src);
fclose(dest);
return 0;
}
For example:
./a.out /Folder1/Folder2/Source /Folder1
This will move the folder called Source inside of Folder1. However when I execute this code it doesn't work. It compiles just fine with g++ and no errors when running but it just doesn't move anything at all. Any ideas on what could be wrong?

Edit: This is referring to the original post, which read FILE * src = opendir( argv[1] );.
The function opendir() returns a DIR *, which is quite different from a FILE * (and cannot be used as a parameter to getc() / putc().
You have to read directory entries from that DIR * using readdir(), which will yield a filename, then copying that file using that information.
Edit: This is referring to the updated post.
You don't use file functions (fopen(), getc() etc.) on directories. The way to go is opendir() followed by readdir(), then acting on the yielded filenames.
I don't really know why fopen() on a directory actually returns a non-null pointer. Personally, I consider this a design flaw, as the operations possible on FILE * are not defined for directories. I would stay well clear of this construct.
Generally speaking, you should read the documentation (man page) of the functions you are using, not (wrongly) assuming things about them. And while you are at it, check return values, too - they might tell you why things don't work as expected.

Retrieve filename from file descriptor in C

Is it possible to get the filename of a file descriptor (Linux) in C?

You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.

I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.

In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.

As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).

Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.

You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?

There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.

Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight