How to check if a path specifies a volume root directory - c

How can I check whether a given path (absolute or relative) specifies the root directory of a volume with POSIX or standard C runtime calls? Ideally, the code should work on both, Linux and Mac OS X.

First, all you can really check using standard functions is if the path presented is a mount point, and that mount point may or may not be the root of its filesystem.
Assuming your system assigns a unique f_fsid value to each mount point, you can use the POSIX-stanard statvfs() function and compare the f_fsid field of the relevant statvfs structure of the path component with its parent:
#include <stdlib.h>
#include <string.h>
#include <sys/statvfs.h>
int isMountPoint( const char *path )
{
struct statvfs sv1, sv2;
char *realP = realpath( path, NULL );
// if the real path is "/", yeah, it's the root
if ( !strcmp( realP, "/" ) )
{
free( realP );
return( 1 );
}
statvfs( realP, &sv1 );
// find the parent by truncating at the last "/"
char *pp = strrchr( realP, '/' );
// if there is no parent, must be root
if ( NULL == pp )
{
free( realP );
return( 1 );
}
// keep the final / (makes handling things like "/mnt"
// much easier)
pp++;
*pp = '\0';
statvfs( realP, &sv2 );
free( realP );
// if f_fsid differs, it's the root of
// the mounted filesystem
return( sv1.f_fsid != sv2.f_fsid );
}
All error checking is left as an exercise (and it would also make this example longer and harder to understand...).
If the f_fsid of a path differs from its parent's f_fsid, or if the path is the root directory itself, the path passed in must be the mount point of its filesystem. Note that this does not have to be the actual root of the filesystem. For example, a host can export a file system via NFS, and an NFS client can mount any subdirectory in the remote file system as the "local root", or a loopback mount can refer to a subdirectory of another filesystem.

Well, you have several ways. Historically, the root inode has had inum of 2, so just stat(path, &buf); and check if (buf.st_ino == 2).
Recently, with proliferation of different filesystem types, there's no warranty of having inum equal to 2 in the root inode, so you must follow a different approach: Just check if the given path and the parent directory (just append "/.." to path) reside in different devices (st_dev field of struct stat info)
The following code illustrates this:
#include <unistd.h>
#include <limits.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
int is_mount(char *path)
{
struct stat sdir; /* inode info */
struct stat spdir; /* parent inode info */
char buffer[PATH_MAX];
int res = stat(path, &sdir);
if (res < 0) return -1;
if (snprintf(buffer, sizeof buffer, "%s/..", path) >= sizeof buffer) {
errno = ENAMETOOLONG;
return -1;
}
res = stat(buffer, &spdir);
if (res < 0) return -1;
return sdir.st_dev != spdir.st_dev; /* SEE ADDITIONAL NOTE */
}
int main(int argc, char **argv)
{
int i;
for (i = 1; i < argc; i++) {
int res = is_mount(argv[i]);
if ( res < 0 ) {
fprintf(stderr,
"%s: %s (errno = %d)\n",
argv[i], strerror(errno), errno);
continue;
}
printf("%5s\t%s\n", res ? "true" : "false", argv[i]);
} /* for */
} /* main */
Execution on a FreeBSD system:
$ ismount /home/luis/usr.ports/german/geonext/Makefile /home/luis/usr.ports/german/geonext /home/luis/usr.ports/german /home/luis/usr.ports /home/luis /home / /var/run/cups /var/run /var pru3.c
/home/luis/usr.ports/german/geonext/Makefile: Not a directory (errno = 20)
pru3.c: Not a directory (errno = 20)
false /home/luis/usr.ports/german/geonext
false /home/luis/usr.ports/german
true /home/luis/usr.ports
false /home/luis
false /home
true / <-- in the above code, this returns false, see ADDITIONAL NOTE.
false /var/run/cups
true /var/run
false /var
PORTABILITY NOTE
This should work in any un*x/linux that implements the stat(2) system call. UNIX v7 already implements this, so it's supposed to be found in all unices available.
ADDITIONAL NOTE
This approach does not work for the actual filesystem root (/) because the parent dir and the root dir both point to the same inode. This can be solved by checking the st_ino of both parent and child inodes to be equal. Just change the test for
return (sdir.st_dev != spdir.st_dev) /* different devices */
|| ( /* sdir.st_dev == spdir.st_dev && ---redundant */
sdir.st_ino == spdir.st_ino); /* root dir case */

POSIX has a realpath function that returns the canonical path of any given path (thus eliminating/resolving dots, dotdots, links, etc).
Now POSIX doesn't define volume, only the hierarchy of filenames from a unique given root /. Now Unix variants are able to mount file trees one on top of other to obtain a single tree at runtime. There is no standard way to "mount" these volumes, even if mount is a very common way, because there is plenty of ways to realize this feature. So you have to read the documentation of your OS variant to determine how you can obtain the list of all mounted points.
You may also read Linux function to get mount points and How to get mount point information from an API on Mac?.

Related

Printing Hardlinks of an Inode in C

I'm writing a program in C to scan the files in a directory, get its inode number, get the hardlink count and print out the hardlinks. So in printing out the hardlinks, i search files from root and match the files with the same inode. However when i set the path to find the matching inode it does not show any files. On the other hand, if i set the path to the same directory i scanned initially it displays one file as a hardlink. I'm open to any other way to display the hardlinks of an inode.
#include <dirent.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include<sys/dir.h>
#include<stdlib.h>
void listFilesRecursively(char *path);
void filter(char *basePath, long inode);
void get_hardLinks(long inode);
int main()
{
// Directory path to list files
char path[100]="../";
listFilesRecursively(path);
return 0;
}
void listFilesRecursively(char *basePath)
{
char path[1000];
struct dirent *dp;
struct stat sb;
DIR *dir = opendir(basePath);
struct dirent **namelist = NULL;
// Unable to open directory stream
if (!dir)
return;
while ((dp = readdir(dir)) != NULL)
{
stat(dp->d_name, &sb);
if (strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0)
{
printf("Inode:%lu | %s |%ld \n",(unsigned long)dp-> d_ino, dp->d_name,(long) sb.st_nlink);
strcpy(path, basePath);
strcat(path, "/");
strcat(path, dp->d_name);
listFilesRecursively(path);
get_hardLinks((unsigned long)dp-> d_ino);
}
}
closedir(dir);
}
void filter(char *basePath, long inode){
char path[1000];
struct dirent *dp;
struct stat sb;
DIR *dir = opendir(basePath);
while ((dp = readdir(dir)) != NULL)
{
stat(dp->d_name, &sb);
if (strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0 && (unsigned long)dp->d_ino==inode && ((long)sb.st_nlink<=10))
{
printf("----HardLink: Inode:%lu | %s \n",(unsigned long)dp-> d_ino, dp->d_name);
strcpy(path, basePath);
strcat(path, "/");
strcat(path, dp->d_name);
filter("/",inode);
}
}
closedir(dir);
}
void get_hardLinks(long inode){
filter("../",inode);
}
The ASK seems to be
List of allfiles (inode, name, #links), in DFS order
for each file show all hard links
It does not explicitly state how symlink should be handled. Answer assumed they can be ignored (including symlink to other directories, which may result in infinite loops).
The current solution has few problems. The first cause incorrect results when scanning over multiple directories, the other may cause serious performance issues when the number of files goes up.
Implementation issue:
Missing Path: The 'stat' calls in listFilesRecursively and filter just pass the file name. This will cause a lookup in the current directory (pwd). But the files is in directory basepath. Code need to form the full pathname by combining the two with a '/'.
Error checking on all system calls: Not testing on 'stat', 'opendir', etc, will have serious impact on the code on any error - either program crash, partial results, on infinite loops.
Performance:
The function listFilesRecursively(path) is being called for each entry. It should only be called for directory entries. (S_ISDIR on the mode).
The function get_hardLinks is getting called for every file ,but it should only get called for files with more than 1 hard link (st_nlink > 1)
The filter will perform recursive call on any entry. Should restrict recursive calls to directories, same as listFilesRecursively above.
Starting point: the question indicate search should start at root, program is starting all searches at the parent folder.
Also, if this is going to run on a large file system in production, consider the following performance tuning:
If the total number of files in the scanned tree is X, current performance will be O(XX), on IO. Even with above improvements, the performance will be O( KX ), where K is the number of files with hardlinks >1, that will cause a scan.
Alternative implementation will perform
Single scan (O(n), remember the i-nodes of all files with hardlinks>1 in dynamically resized array
Sort the array by inodes
Print the files with identical hardlinks from memory.
This will be O(n) for IO, and O(n log n) for sorting. Practically O(n) because IO will dominate the calls. Much faster the current logic.

How can I determine the mount path of a given file on Linux in C?

I have an arbitrary file for which I would like to determine the mount point. Let's say that it is /mnt/bar/foo.txt, where we have the following mount points in addition to the "normal" Linux mount points:
[some device mounted to] -> /mnt/bar
[some device mounted to] -> /mnt/other
I've taken a look at stat() and statvfs(). statvfs() can give me the filesystem id, and stat can give me the id of the device, but neither of these can really correlate to the mount point.
I'm thinking that what I will have to do is call readlink() on the arbitrary file, and then read through /proc/mounts, figuring out which path most closely matches the filename. Is this a good approach, or is there some great libc function I'm missing out on to to this?
You can do it with a combination of getfsent to iterate through the list of devices, and stat to check if your file is on that device.
#include <fstab.h> /* for getfsent() */
#include <sys/stat.h> /* for stat() */
struct fstab *getfssearch(const char *path) {
/* stat the file in question */
struct stat path_stat;
stat(path, &path_stat);
/* iterate through the list of devices */
struct fstab *fs = NULL;
while( (fs = getfsent()) ) {
/* stat the mount point */
struct stat dev_stat;
stat(fs->fs_file, &dev_stat);
/* check if our file and the mount point are on the same device */
if( dev_stat.st_dev == path_stat.st_dev ) {
break;
}
}
return fs;
}
Note, for brevity there's no error checking there. Also getfsent is not a POSIX function, but it is a very widely used convention. It works here on OS X which doesn't even use /etc/fstab. It is also not thread safe.

How to list first level directories only in C?

In a terminal I can call ls -d */. Now I want a c program to do that for me, like this:
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
int main( void )
{
int status;
char *args[] = { "/bin/ls", "-l", NULL };
if ( fork() == 0 )
execv( args[0], args );
else
wait( &status );
return 0;
}
This will ls -l everything. However, when I am trying:
char *args[] = { "/bin/ls", "-d", "*/", NULL };
I will get a runtime error:
ls: */: No such file or directory
The lowest-level way to do this is with the same Linux system calls ls uses.
So look at the output of strace -efile,getdents ls:
execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
...
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 23 entries */, 32768) = 840
getdents(3, /* 0 entries */, 32768) = 0
...
getdents is a Linux-specific system call. The man page says that it's used under the hood by libc's readdir(3) POSIX API function.
The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. POSIX doesn't specify the exact system call interface, unlike for non-directory files.
These functions:
DIR *opendir(const char *name);
struct dirent *readdir(DIR *dirp);
can be used like this:
// print all directories, and symlinks to directories, in the CWD.
// like sh -c 'ls -1UF -d */' (single-column output, no sorting, append a / to dir names)
// tested and works on Linux, with / without working d_type
#define _GNU_SOURCE // includes _BSD_SOURCE for DT_UNKNOWN etc.
#include <dirent.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
DIR *dirhandle = opendir("."); // POSIX doesn't require this to be a plain file descriptor. Linux uses open(".", O_DIRECTORY); to implement this
//^Todo: error check
struct dirent *de;
while(de = readdir(dirhandle)) { // NULL means end of directory
_Bool is_dir;
#ifdef _DIRENT_HAVE_D_TYPE
if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
// don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
is_dir = (de->d_type == DT_DIR);
} else
#endif
{ // the only method if d_type isn't available,
// otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
struct stat stbuf;
// stat follows symlinks, lstat doesn't.
stat(de->d_name, &stbuf); // TODO: error check
is_dir = S_ISDIR(stbuf.st_mode);
}
if (is_dir) {
printf("%s/\n", de->d_name);
}
}
}
There's also a fully compilable example of reading directory entries and printing file info in the Linux stat(3posix) man page. (not the Linux stat(2) man page; it has a different example).
The man page for readdir(3) says the Linux declaration of struct dirent is:
struct dirent {
ino_t d_ino; /* inode number */
off_t d_off; /* not an offset; see NOTES */
unsigned short d_reclen; /* length of this record */
unsigned char d_type; /* type of file; not supported
by all filesystem types */
char d_name[256]; /* filename */
};
d_type is either DT_UNKNOWN, in which case you need to stat to learn anything about whether the directory entry is itself a directory. Or it can be DT_DIR or something else, in which case you can be sure it is or isn't a directory without having to stat it.
Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. This is a huge speedup for find -name: it doesn't have to stat anything to recurse through subdirs. But for filesystems that don't do this, d_type will always be DT_UNKNOWN, because filling it in would require reading all the inodes (which might not even be loaded from disk).
Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in d_type when it's not cheap. d_type is just a performance shortcut; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills in d_type, and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.)
Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run true | xargs --show-limits to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.
(When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)
Fortunately, there are several solutions. One is to use find instead:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");
You can also format the output as you wish, not depending on locale:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");
If you want to sort the output, use \0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \0 as the separator, too. tr will convert them to newlines for you:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");
If you want the names in an array, use glob() function instead.
Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:
#define _GNU_SOURCE
#include <stdio.h>
#include <ftw.h>
#define NUM_FDS 17
int myfunc(const char *path,
const struct stat *fileinfo,
int typeflag,
struct FTW *ftwinfo)
{
const char *file = path + ftwinfo->base;
const int depth = ftwinfo->level;
/* We are only interested in first-level directories.
Note that depth==0 is the directory itself specified as a parameter.
*/
if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
return 0;
/* Don't list names starting with a . */
if (file[0] != '.')
printf("%s/\n", path);
/* Do not recurse. */
return FTW_SKIP_SUBTREE;
}
and the nftw() call to use the above is obviously something like
if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
/* An error occurred. */
}
The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.
You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.
The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.
If you want to create a test directory with lots of subdirectories, use something like the following Bash:
mkdir lots-of-subdirs
cd lots-of-subdirs
for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done
On my system, running
ls -d */
in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.
You also cannot remove the directories using rmdir directory-*/ for the same reason. Use
find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir
instead. Or just remove the entire directory and subdirectories,
cd ..
rm -rf lots-of-subdirs
Just call system. Globs on Unixes are expanded by the shell. system will give you a shell.
You can avoid the whole fork-exec thing by doing the glob(3) yourself:
int ec;
glob_t gbuf;
if(0==(ec=glob("*/", 0, NULL, &gbuf))){
char **p = gbuf.gl_pathv;
if(p){
while(*p)
printf("%s\n", *p++);
}
}else{
/*handle glob error*/
}
You could pass the results to a spawned ls, but there's hardly a point in doing that.
(If you do want to do fork and exec, you should start with a template that does proper error checking -- each of those calls may fail.)
If you are looking for a simple way to get a list of folders into your program, I'd rather suggest the spawnless way, not calling an external program, and use the standard POSIX opendir/readdir functions.
It's almost as short as your program, but has several additional advantages:
you get to pick folders and files at will by checking the d_type
you can elect to early discard system entries and (semi)hidden entries by testing the first character of the name for a .
you can immediately print out the result, or store it in memory for later use
you can do additional operations on the list in memory, such as sorting and removing other entries that don't need to be included.
#include <stdio.h>
#include <sys/types.h>
#include <sys/dir.h>
int main( void )
{
DIR *dirp;
struct dirent *dp;
dirp = opendir(".");
while ((dp = readdir(dirp)) != NULL)
{
if (dp->d_type & DT_DIR)
{
/* exclude common system entries and (semi)hidden names */
if (dp->d_name[0] != '.')
printf ("%s\n", dp->d_name);
}
}
closedir(dirp);
return 0;
}
Another less low-level approach, with system():
#include <stdlib.h>
int main(void)
{
system("/bin/ls -d */");
return 0;
}
Notice with system(), you don't need to fork(). However, I recall that we should avoid using system() when possible!
As Nomimal Animal said, this will fail when the number of subdirectories is too big! See his answer for more...

Get directory path by fd

I've run into the need to be able refer to a directory by path given its file descriptor in Linux. The path doesn't have to be canonical, it just has to be functional so that I can pass it to other functions. So, taking the same parameters as passed to a function like fstatat(), I need to be able to call a function like getxattr() which doesn't have a f-XYZ-at() variant.
So far I've come up with these solutions; though none are particularly elegant.
The simplest solution is to avoid the problem by calling openat() and then using a function like fgetxattr(). This works, but not in every situation. So another method is needed to fill the gaps.
The next solution involves looking up the information in proc:
if (!access("/proc/self/fd",X_OK)) {
sprintf(path,"/proc/self/fd/%i/",fd);
}
This, of course, totally breaks on systems without proc, including some chroot environments.
The last option, a more portable but potentially-race-condition-prone solution, looks like this:
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
The obvious problem here is that in a multithreaded app, changing the working directory around could have side effects.
However, the fact that it works is compelling: if I can get the path of a directory by calling fchdir() followed by getcwd(), why shouldn't I be able to just get the information directly: fgetcwd() or something. Clearly the kernel is tracking the necessary information.
So how do I get to it?
Answer
The way Linux implements getcwd in the kernel is this: it starts at the directory entry in question and prepends the name of the parent of that directory to the path string, and repeats that process until it reaches the root. This same mechanism can be theoretically implemented in user-space.
Thanks to Jonathan Leffler for pointing this algorithm out. Here is a link to the kernel implementation of this function: https://github.com/torvalds/linux/blob/v3.4/fs/dcache.c#L2577
The kernel thinks of directories differently from the way you do - it thinks in terms of inode numbers. It keeps a record of the inode number (and device number) for the directory, and that is all it needs as the current directory. The fact that you sometimes specify a name to it means it goes and tracks down the inode number corresponding to that name, but it preserves only the inode number because that's all it needs.
So, you will have to code a suitable function. You can open a directory directly with open() precisely to get a file descriptor that can be used by fchdir(); you can't do anything else with it on many modern systems. You can also fail to open the current directory; you should be testing that result. The circumstances where this happens are rare, but not non-existent. (A SUID program might chdir() to a directory that the SUID privileges permit, but then drop the SUID privileges leaving the process unable to read the directory; the getcwd() call will fail in such circumstances too - so you must error check that, too!) Also, if a directory is removed while your (possibly long-running) process has it open, then a subsequent getcwd() will fail.
Always check results from system calls; there are usually circumstances where they can fail, even though it is dreadfully inconvenient of them to do so. There are exceptions - getpid() is the canonical example - but they are few and far between. (OK: not all that far between - getppid() is another example, and it is pretty darn close to getpid() in the manual; and getuid() and relatives are also not far off in the manual.)
Multi-threaded applications are a problem; using chdir() is not a good idea in those. You might have to fork() and have the child evaluate the directory name, and then somehow communicate that back to the parent.
bignose asks:
This is interesting, but seems to go against the querent's reported experience: that getcwd knows how to get the path from the fd. That indicates that the system knows how to go from fd to path in at least some situations; can you edit your answer to address this?
For this, it helps to understand how - or at least one mechanism by which - the getcwd() function can be written. Ignoring the issue of 'no permission', the basic mechanism by which it works is:
Use stat on the root directory '/' (so you know when to stop going upwards).
Use stat on the current directory '.' (so you know where you are); this gives you a current inode.
Until you reach the root directory:
Scan the parent directory '..' until you find the entry with the same inode as the current inode; this gives you the next component name of the directory path.
And then change the current inode to the inode of '.' in the parent directory.
When you reach root, you can build the path.
Here is an implementation of that algorithm. It is old code (originally 1986; the last non-cosmetic changes were in 1998) and doesn't make use of fchdir() as it should. It also works horribly if you have NFS automounted file systems to traverse - which is why I don't use it any more. However, this is roughly equivalent to the basic scheme used by getcwd(). (Ooh; I see a 18 character string ("../123456789.abcd") - well, back when it was written, the machines I worked on only had the very old 14-character only filenames - not the modern flex names. Like I said, it is old code! I haven't seen one of those file systems in what, 15 years or so - maybe longer. There is also some code to mess with longer names. Be cautious using this.)
/*
#(#)File: $RCSfile: getpwd.c,v $
#(#)Version: $Revision: 2.5 $
#(#)Last changed: $Date: 2008/02/11 08:44:50 $
#(#)Purpose: Evaluate present working directory
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 1987-91,1997-98,2005,2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#define _POSIX_SOURCE 1
#include "getpwd.h"
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#if defined(_POSIX_SOURCE) || defined(USG_DIRENT)
#include "dirent.h"
#elif defined(BSD_DIRENT)
#include <sys/dir.h>
#define dirent direct
#else
What type of directory handling do you have?
#endif
#define DIRSIZ 256
typedef struct stat Stat;
static Stat root;
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_getpwd_c[] = "#(#)$Id: getpwd.c,v 2.5 2008/02/11 08:44:50 jleffler Exp $";
#endif /* lint */
/* -- Routine: inode_number */
static ino_t inode_number(char *path, char *name)
{
ino_t inode;
Stat st;
char buff[DIRSIZ + 6];
strcpy(buff, path);
strcat(buff, "/");
strcat(buff, name);
if (stat(buff, &st))
inode = 0;
else
inode = st.st_ino;
return(inode);
}
/*
-- Routine: finddir
Purpose: Find name of present working directory
Given:
In: Inode of current directory
In: Device for current directory
Out: pathname of current directory
In: Length of buffer for pathname
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/88 JL Rewritten to use opendir/readdir/closedir
25/09/90 JL Modified to pay attention to length
10/11/98 JL Convert to prototypes
*/
static int finddir(ino_t inode, dev_t device, char *path, size_t plen)
{
register char *src;
register char *dst;
char *end;
DIR *dp;
struct dirent *d_entry;
Stat dotdot;
Stat file;
ino_t d_inode;
int status;
static char name[] = "../123456789.abcd";
char d_name[DIRSIZ + 1];
if (stat("..", &dotdot) || (dp = opendir("..")) == 0)
return(-1);
/* Skip over "." and ".." */
if ((d_entry = readdir(dp)) == 0 ||
(d_entry = readdir(dp)) == 0)
{
/* Should never happen */
closedir(dp);
return(-1);
}
status = 1;
while (status)
{
if ((d_entry = readdir(dp)) == 0)
{
/* Got to end of directory without finding what we wanted */
/* Probably a corrupt file system */
closedir(dp);
return(-1);
}
else if ((d_inode = inode_number("..", d_entry->d_name)) != 0 &&
(dotdot.st_dev != device))
{
/* Mounted file system */
dst = &name[3];
src = d_entry->d_name;
while ((*dst++ = *src++) != '\0')
;
if (stat(name, &file))
{
/* Can't stat this file */
continue;
}
status = (file.st_ino != inode || file.st_dev != device);
}
else
{
/* Ordinary directory hierarchy */
status = (d_inode != inode);
}
}
strncpy(d_name, d_entry->d_name, DIRSIZ);
closedir(dp);
/**
** NB: we have closed the directory we are reading before we move out of it.
** This means that we should only be using one extra file descriptor.
** It also means that the space d_entry points to is now invalid.
*/
src = d_name;
dst = path;
end = path + plen;
if (dotdot.st_ino == root.st_ino && dotdot.st_dev == root.st_dev)
{
/* Found root */
status = 0;
if (dst < end)
*dst++ = '/';
while (dst < end && (*dst++ = *src++) != '\0')
;
}
else if (chdir(".."))
status = -1;
else
{
/* RECURSE */
status = finddir(dotdot.st_ino, dotdot.st_dev, path, plen);
(void)chdir(d_name); /* We've been here before */
if (status == 0)
{
while (*dst)
dst++;
if (dst < end)
*dst++ = '/';
while (dst < end && (*dst++ = *src++) != '\0')
;
}
}
if (dst >= end)
status = -1;
return(status);
}
/*
-- Routine: getpwd
Purpose: Evaluate name of current directory
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/88 JL Short circuit if pwd = /
25/09/90 JL Revise interface; check length
10/11/98 JL Convert to prototypes
Known Bugs
----------
1. Uses chdir() and could possibly get lost in some other directory
2. Can be very slow on NFS with automounts enabled.
*/
char *getpwd(char *pwd, size_t plen)
{
int status;
Stat here;
if (pwd == 0)
pwd = malloc(plen);
if (pwd == 0)
return (pwd);
if (stat("/", &root) || stat(".", &here))
status = -1;
else if (root.st_ino == here.st_ino && root.st_dev == here.st_dev)
{
strcpy(pwd, "/");
status = 0;
}
else
status = finddir(here.st_ino, here.st_dev, pwd, plen);
if (status != 0)
pwd = 0;
return (pwd);
}
#ifdef TEST
#include <stdio.h>
/*
-- Routine: main
Purpose: Test getpwd()
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/90 JL Modified interface; use GETCWD to check result
*/
int main(void)
{
char pwd[512];
int pwd_len;
if (getpwd(pwd, sizeof(pwd)) == 0)
printf("GETPWD failed to evaluate pwd\n");
else
printf("GETPWD: %s\n", pwd);
if (getcwd(pwd, sizeof(pwd)) == 0)
printf("GETCWD failed to evaluate pwd\n");
else
printf("GETCWD: %s\n", pwd);
pwd_len = strlen(pwd);
if (getpwd(pwd, pwd_len - 1) == 0)
printf("GETPWD failed to evaluate pwd (buffer is 1 char short)\n");
else
printf("GETPWD: %s (but should have failed!!!)\n", pwd);
return(0);
}
#endif /* TEST */
Jonathan's answer is very fine in showing how it works. But it doesn't show a workaround for the situation you describe.
I would as well use something like you describe:
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
but, in order to avoid race conditions in with threads, fork another process in order to do that.
That might sound expensive, but if you don't do that too often, it should be ok.
The idea is something like this (no runnable code, just a raw idea):
int fd[2];
pipe(fd);
pid_t pid;
if ((pid = fork()) == 0) {
// child; here we do the chdir etc. stuff
close(fd[0]); // read end
char path[PATH_MAX+1];
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
write(fd[1], path, strlen(path));
close(fd[1]);
_exit(EXIT_SUCCESS);
} else {
// parent; pid is our child
close(fd[1]); // write end
int cursor=0;
while ((r=read(fd[0], &path+cursor, PATH_MAX)) > 0) {
cursor += r;
}
path[cursor]='\0'; // make it 0-terminated
close(fd[0]);
wait(NULL);
}
I am not sure if this will resolve all issues, and I as well do not do any error checking, so that's what you should add.

Delete files while reading directory with readdir()

My code is something like this:
DIR* pDir = opendir("/path/to/my/dir");
struct dirent pFile = NULL;
while ((pFile = readdir())) {
// Check if it is a .zip file
if (subrstr(pFile->d_name,".zip") {
// It is a .zip file, delete it, and the matching log file
char zipname[200];
snprintf(zipname, sizeof(zipname), "/path/to/my/dir/%s", pFile->d_name);
unlink(zipname);
char* logname = subsstr(zipname, 0, strlen(pFile->d_name)-4); // Strip of .zip
logname = appendstring(&logname, ".log"); // Append .log
unlink(logname);
}
closedir(pDir);
(this code is untested and purely an example)
The point is: Is it allowed to delete a file in a directory while looping through the directory with readdir()?
Or will readdir() still find the deleted .log file?
Quote from POSIX readdir:
If a file is removed from or added to
the directory after the most recent
call to opendir() or rewinddir(),
whether a subsequent call to readdir()
returns an entry for that file is
unspecified.
So, my guess is ... it depends.
It depends on the OS, on the time of day, on the relative order of the files added/deleted, ...
And, as a further point, between the time the readdir() function returns and you try to unlink() the file, some other process could have deleted that file and your unlink() fails.
Edit
I tested with this program:
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(void) {
struct dirent *de;
DIR *dd;
/* create files `one.zip` and `one.log` before entering the readdir() loop */
printf("creating `one.log` and `one.zip`\n");
system("touch one.log"); /* assume it worked */
system("touch one.zip"); /* assume it worked */
dd = opendir("."); /* assume it worked */
while ((de = readdir(dd)) != NULL) {
printf("found %s\n", de->d_name);
if (strstr(de->d_name, ".zip")) {
char logname[1200];
size_t i;
if (*de->d_name == 'o') {
/* create `two.zip` and `two.log` when the program finds `one.zip` */
printf("creating `two.zip` and `two.log`\n");
system("touch two.zip"); /* assume it worked */
system("touch two.log"); /* assume it worked */
}
printf("unlinking %s\n", de->d_name);
if (unlink(de->d_name)) perror("unlink");
strcpy(logname, de->d_name);
i = strlen(logname);
logname[i-3] = 'l';
logname[i-2] = 'o';
logname[i-1] = 'g';
printf("unlinking %s\n", logname);
if (unlink(logname)) perror("unlink");
}
}
closedir(dd); /* assume it worked */
return 0;
}
On my computer, readdir() finds deleted files and does not find files created between opendir() and readdir(). But it may be different on another computer; it may be different on my computer if I compile with different options; it may be different if I upgrade the kernel; ...
I'm testing my new Linux reference book. The Linux Programming Interface by Michael Kerrisk and it says the following:
SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from since the last since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.
I think that what is unspecified is what happens to dirents not yet scanned. Once an entry has been returned, it is 100% guaranteed that it will not be returned anymore whether or not you unlink the current dirent.
Also note the guarantee provided by the second sentence. Since you are leaving alone the other files and only unlinking the current entry for the zip file, SUSv3 guarantees that all the other files will be returned. What happens to the log file is undefined. it may or may not be returned by readdir() but in your case, it shouldn't be harmful.
The reason why I have explored the question it is to find an efficient way to close file descriptors in a child process before exec().
The suggested way in APUE from Stevens is to do the following:
int max_open = sysconf(_SC_OPEN_MAX);
for (int i = 0; i < max_open; ++i)
close(i);
but I am thinking using code similar to what is found in the OP to scan /dev/fd/ directory to know exactly which fds I need to close. (Special note to myself, skip over dirfd contained in the DIR handle.)
I found the following page describe the solution of this problem.
https://support.apple.com/kb/TA21420

Resources