I've run into the need to be able refer to a directory by path given its file descriptor in Linux. The path doesn't have to be canonical, it just has to be functional so that I can pass it to other functions. So, taking the same parameters as passed to a function like fstatat(), I need to be able to call a function like getxattr() which doesn't have a f-XYZ-at() variant.
So far I've come up with these solutions; though none are particularly elegant.
The simplest solution is to avoid the problem by calling openat() and then using a function like fgetxattr(). This works, but not in every situation. So another method is needed to fill the gaps.
The next solution involves looking up the information in proc:
if (!access("/proc/self/fd",X_OK)) {
sprintf(path,"/proc/self/fd/%i/",fd);
}
This, of course, totally breaks on systems without proc, including some chroot environments.
The last option, a more portable but potentially-race-condition-prone solution, looks like this:
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
The obvious problem here is that in a multithreaded app, changing the working directory around could have side effects.
However, the fact that it works is compelling: if I can get the path of a directory by calling fchdir() followed by getcwd(), why shouldn't I be able to just get the information directly: fgetcwd() or something. Clearly the kernel is tracking the necessary information.
So how do I get to it?
Answer
The way Linux implements getcwd in the kernel is this: it starts at the directory entry in question and prepends the name of the parent of that directory to the path string, and repeats that process until it reaches the root. This same mechanism can be theoretically implemented in user-space.
Thanks to Jonathan Leffler for pointing this algorithm out. Here is a link to the kernel implementation of this function: https://github.com/torvalds/linux/blob/v3.4/fs/dcache.c#L2577
The kernel thinks of directories differently from the way you do - it thinks in terms of inode numbers. It keeps a record of the inode number (and device number) for the directory, and that is all it needs as the current directory. The fact that you sometimes specify a name to it means it goes and tracks down the inode number corresponding to that name, but it preserves only the inode number because that's all it needs.
So, you will have to code a suitable function. You can open a directory directly with open() precisely to get a file descriptor that can be used by fchdir(); you can't do anything else with it on many modern systems. You can also fail to open the current directory; you should be testing that result. The circumstances where this happens are rare, but not non-existent. (A SUID program might chdir() to a directory that the SUID privileges permit, but then drop the SUID privileges leaving the process unable to read the directory; the getcwd() call will fail in such circumstances too - so you must error check that, too!) Also, if a directory is removed while your (possibly long-running) process has it open, then a subsequent getcwd() will fail.
Always check results from system calls; there are usually circumstances where they can fail, even though it is dreadfully inconvenient of them to do so. There are exceptions - getpid() is the canonical example - but they are few and far between. (OK: not all that far between - getppid() is another example, and it is pretty darn close to getpid() in the manual; and getuid() and relatives are also not far off in the manual.)
Multi-threaded applications are a problem; using chdir() is not a good idea in those. You might have to fork() and have the child evaluate the directory name, and then somehow communicate that back to the parent.
bignose asks:
This is interesting, but seems to go against the querent's reported experience: that getcwd knows how to get the path from the fd. That indicates that the system knows how to go from fd to path in at least some situations; can you edit your answer to address this?
For this, it helps to understand how - or at least one mechanism by which - the getcwd() function can be written. Ignoring the issue of 'no permission', the basic mechanism by which it works is:
Use stat on the root directory '/' (so you know when to stop going upwards).
Use stat on the current directory '.' (so you know where you are); this gives you a current inode.
Until you reach the root directory:
Scan the parent directory '..' until you find the entry with the same inode as the current inode; this gives you the next component name of the directory path.
And then change the current inode to the inode of '.' in the parent directory.
When you reach root, you can build the path.
Here is an implementation of that algorithm. It is old code (originally 1986; the last non-cosmetic changes were in 1998) and doesn't make use of fchdir() as it should. It also works horribly if you have NFS automounted file systems to traverse - which is why I don't use it any more. However, this is roughly equivalent to the basic scheme used by getcwd(). (Ooh; I see a 18 character string ("../123456789.abcd") - well, back when it was written, the machines I worked on only had the very old 14-character only filenames - not the modern flex names. Like I said, it is old code! I haven't seen one of those file systems in what, 15 years or so - maybe longer. There is also some code to mess with longer names. Be cautious using this.)
/*
#(#)File: $RCSfile: getpwd.c,v $
#(#)Version: $Revision: 2.5 $
#(#)Last changed: $Date: 2008/02/11 08:44:50 $
#(#)Purpose: Evaluate present working directory
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 1987-91,1997-98,2005,2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#define _POSIX_SOURCE 1
#include "getpwd.h"
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#if defined(_POSIX_SOURCE) || defined(USG_DIRENT)
#include "dirent.h"
#elif defined(BSD_DIRENT)
#include <sys/dir.h>
#define dirent direct
#else
What type of directory handling do you have?
#endif
#define DIRSIZ 256
typedef struct stat Stat;
static Stat root;
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_getpwd_c[] = "#(#)$Id: getpwd.c,v 2.5 2008/02/11 08:44:50 jleffler Exp $";
#endif /* lint */
/* -- Routine: inode_number */
static ino_t inode_number(char *path, char *name)
{
ino_t inode;
Stat st;
char buff[DIRSIZ + 6];
strcpy(buff, path);
strcat(buff, "/");
strcat(buff, name);
if (stat(buff, &st))
inode = 0;
else
inode = st.st_ino;
return(inode);
}
/*
-- Routine: finddir
Purpose: Find name of present working directory
Given:
In: Inode of current directory
In: Device for current directory
Out: pathname of current directory
In: Length of buffer for pathname
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/88 JL Rewritten to use opendir/readdir/closedir
25/09/90 JL Modified to pay attention to length
10/11/98 JL Convert to prototypes
*/
static int finddir(ino_t inode, dev_t device, char *path, size_t plen)
{
register char *src;
register char *dst;
char *end;
DIR *dp;
struct dirent *d_entry;
Stat dotdot;
Stat file;
ino_t d_inode;
int status;
static char name[] = "../123456789.abcd";
char d_name[DIRSIZ + 1];
if (stat("..", &dotdot) || (dp = opendir("..")) == 0)
return(-1);
/* Skip over "." and ".." */
if ((d_entry = readdir(dp)) == 0 ||
(d_entry = readdir(dp)) == 0)
{
/* Should never happen */
closedir(dp);
return(-1);
}
status = 1;
while (status)
{
if ((d_entry = readdir(dp)) == 0)
{
/* Got to end of directory without finding what we wanted */
/* Probably a corrupt file system */
closedir(dp);
return(-1);
}
else if ((d_inode = inode_number("..", d_entry->d_name)) != 0 &&
(dotdot.st_dev != device))
{
/* Mounted file system */
dst = &name[3];
src = d_entry->d_name;
while ((*dst++ = *src++) != '\0')
;
if (stat(name, &file))
{
/* Can't stat this file */
continue;
}
status = (file.st_ino != inode || file.st_dev != device);
}
else
{
/* Ordinary directory hierarchy */
status = (d_inode != inode);
}
}
strncpy(d_name, d_entry->d_name, DIRSIZ);
closedir(dp);
/**
** NB: we have closed the directory we are reading before we move out of it.
** This means that we should only be using one extra file descriptor.
** It also means that the space d_entry points to is now invalid.
*/
src = d_name;
dst = path;
end = path + plen;
if (dotdot.st_ino == root.st_ino && dotdot.st_dev == root.st_dev)
{
/* Found root */
status = 0;
if (dst < end)
*dst++ = '/';
while (dst < end && (*dst++ = *src++) != '\0')
;
}
else if (chdir(".."))
status = -1;
else
{
/* RECURSE */
status = finddir(dotdot.st_ino, dotdot.st_dev, path, plen);
(void)chdir(d_name); /* We've been here before */
if (status == 0)
{
while (*dst)
dst++;
if (dst < end)
*dst++ = '/';
while (dst < end && (*dst++ = *src++) != '\0')
;
}
}
if (dst >= end)
status = -1;
return(status);
}
/*
-- Routine: getpwd
Purpose: Evaluate name of current directory
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/88 JL Short circuit if pwd = /
25/09/90 JL Revise interface; check length
10/11/98 JL Convert to prototypes
Known Bugs
----------
1. Uses chdir() and could possibly get lost in some other directory
2. Can be very slow on NFS with automounts enabled.
*/
char *getpwd(char *pwd, size_t plen)
{
int status;
Stat here;
if (pwd == 0)
pwd = malloc(plen);
if (pwd == 0)
return (pwd);
if (stat("/", &root) || stat(".", &here))
status = -1;
else if (root.st_ino == here.st_ino && root.st_dev == here.st_dev)
{
strcpy(pwd, "/");
status = 0;
}
else
status = finddir(here.st_ino, here.st_dev, pwd, plen);
if (status != 0)
pwd = 0;
return (pwd);
}
#ifdef TEST
#include <stdio.h>
/*
-- Routine: main
Purpose: Test getpwd()
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/90 JL Modified interface; use GETCWD to check result
*/
int main(void)
{
char pwd[512];
int pwd_len;
if (getpwd(pwd, sizeof(pwd)) == 0)
printf("GETPWD failed to evaluate pwd\n");
else
printf("GETPWD: %s\n", pwd);
if (getcwd(pwd, sizeof(pwd)) == 0)
printf("GETCWD failed to evaluate pwd\n");
else
printf("GETCWD: %s\n", pwd);
pwd_len = strlen(pwd);
if (getpwd(pwd, pwd_len - 1) == 0)
printf("GETPWD failed to evaluate pwd (buffer is 1 char short)\n");
else
printf("GETPWD: %s (but should have failed!!!)\n", pwd);
return(0);
}
#endif /* TEST */
Jonathan's answer is very fine in showing how it works. But it doesn't show a workaround for the situation you describe.
I would as well use something like you describe:
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
but, in order to avoid race conditions in with threads, fork another process in order to do that.
That might sound expensive, but if you don't do that too often, it should be ok.
The idea is something like this (no runnable code, just a raw idea):
int fd[2];
pipe(fd);
pid_t pid;
if ((pid = fork()) == 0) {
// child; here we do the chdir etc. stuff
close(fd[0]); // read end
char path[PATH_MAX+1];
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
write(fd[1], path, strlen(path));
close(fd[1]);
_exit(EXIT_SUCCESS);
} else {
// parent; pid is our child
close(fd[1]); // write end
int cursor=0;
while ((r=read(fd[0], &path+cursor, PATH_MAX)) > 0) {
cursor += r;
}
path[cursor]='\0'; // make it 0-terminated
close(fd[0]);
wait(NULL);
}
I am not sure if this will resolve all issues, and I as well do not do any error checking, so that's what you should add.
Related
I write a function to count the number of files in, and below a directory (including files in the sub directory).
However, When I test the code on a directory with sub directory, it always report error said: "fail to open dir: No such file or directory".
Is there any thing I could do to make it work?
int countfiles(char *root, bool a_flag)//a_flag decide if it including hidden file
{
DIR *dir;
struct dirent * ptr;
int total = 0;
char path[MAXPATHLEN];
dir = opendir(root); //open root dirctory
if(dir == NULL)
{
perror("fail to open dir");
exit(1);
}
errno = 0;
while((ptr = readdir(dir)) != NULL)
{
//read every entry in dir
//skip ".." and "."
if(strcmp(ptr->d_name,".") == 0 || strcmp(ptr->d_name,"..") == 0)
{
continue;
}
//If it is a directory, recurse
if(ptr->d_type == DT_DIR)
{
sprintf(path,"%s%s/",root,ptr->d_name);
//printf("%s/n",path);
total += countfiles(path, a_flag);
}
if(ptr->d_type == DT_REG)
{
if(a_flag == 1){
total++;
}
else if (a_flag == 0){
if (isHidden(ptr->d_name) == 0){
total++;
}
}
}
}
if(errno != 0)
{
printf("fail to read dir");
exit(1);
}
closedir(dir);
return total;
}
Is there anything I could make it to work?
Sure, lots. Personally, I'd start by using the correct interface for this stuff, which in Linux and POSIXy systems would be nftw(). This would lead to a program that was not only shorter and more effective, but would not as easily get confused if someone renames a directory or file in the tree being scanned at the same time.
Programmers almost never implement opendir()/readdir()/closedir() as robustly and as efficiently as nftw(), scandir(), glob(), or the fts family of functions do. Why teachers still insist on using the archaic *dir() functions in this day and age, puzzles me to no end.
If you have to use the *dir functions because your teacher does not know POSIX and wants you to use interfaces you should not use in real life, then look at how you construct the path to the new directory: the sprintf() line. Perhaps even print it (path) out, and you'll probably find the fix on your own.
Even then, sprintf() is not something that is allowed in real life programs (because it will cause a silent buffer overrun when the arguments are longer than expected; and that CAN happen in Linux, because there actually isn't a fixed limit on the length of a path). You should use at minimum snprintf() and check its return value for overruns, or in Linux, asprintf() which allocates the resulting string dynamically.
How can I check whether a given path (absolute or relative) specifies the root directory of a volume with POSIX or standard C runtime calls? Ideally, the code should work on both, Linux and Mac OS X.
First, all you can really check using standard functions is if the path presented is a mount point, and that mount point may or may not be the root of its filesystem.
Assuming your system assigns a unique f_fsid value to each mount point, you can use the POSIX-stanard statvfs() function and compare the f_fsid field of the relevant statvfs structure of the path component with its parent:
#include <stdlib.h>
#include <string.h>
#include <sys/statvfs.h>
int isMountPoint( const char *path )
{
struct statvfs sv1, sv2;
char *realP = realpath( path, NULL );
// if the real path is "/", yeah, it's the root
if ( !strcmp( realP, "/" ) )
{
free( realP );
return( 1 );
}
statvfs( realP, &sv1 );
// find the parent by truncating at the last "/"
char *pp = strrchr( realP, '/' );
// if there is no parent, must be root
if ( NULL == pp )
{
free( realP );
return( 1 );
}
// keep the final / (makes handling things like "/mnt"
// much easier)
pp++;
*pp = '\0';
statvfs( realP, &sv2 );
free( realP );
// if f_fsid differs, it's the root of
// the mounted filesystem
return( sv1.f_fsid != sv2.f_fsid );
}
All error checking is left as an exercise (and it would also make this example longer and harder to understand...).
If the f_fsid of a path differs from its parent's f_fsid, or if the path is the root directory itself, the path passed in must be the mount point of its filesystem. Note that this does not have to be the actual root of the filesystem. For example, a host can export a file system via NFS, and an NFS client can mount any subdirectory in the remote file system as the "local root", or a loopback mount can refer to a subdirectory of another filesystem.
Well, you have several ways. Historically, the root inode has had inum of 2, so just stat(path, &buf); and check if (buf.st_ino == 2).
Recently, with proliferation of different filesystem types, there's no warranty of having inum equal to 2 in the root inode, so you must follow a different approach: Just check if the given path and the parent directory (just append "/.." to path) reside in different devices (st_dev field of struct stat info)
The following code illustrates this:
#include <unistd.h>
#include <limits.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
int is_mount(char *path)
{
struct stat sdir; /* inode info */
struct stat spdir; /* parent inode info */
char buffer[PATH_MAX];
int res = stat(path, &sdir);
if (res < 0) return -1;
if (snprintf(buffer, sizeof buffer, "%s/..", path) >= sizeof buffer) {
errno = ENAMETOOLONG;
return -1;
}
res = stat(buffer, &spdir);
if (res < 0) return -1;
return sdir.st_dev != spdir.st_dev; /* SEE ADDITIONAL NOTE */
}
int main(int argc, char **argv)
{
int i;
for (i = 1; i < argc; i++) {
int res = is_mount(argv[i]);
if ( res < 0 ) {
fprintf(stderr,
"%s: %s (errno = %d)\n",
argv[i], strerror(errno), errno);
continue;
}
printf("%5s\t%s\n", res ? "true" : "false", argv[i]);
} /* for */
} /* main */
Execution on a FreeBSD system:
$ ismount /home/luis/usr.ports/german/geonext/Makefile /home/luis/usr.ports/german/geonext /home/luis/usr.ports/german /home/luis/usr.ports /home/luis /home / /var/run/cups /var/run /var pru3.c
/home/luis/usr.ports/german/geonext/Makefile: Not a directory (errno = 20)
pru3.c: Not a directory (errno = 20)
false /home/luis/usr.ports/german/geonext
false /home/luis/usr.ports/german
true /home/luis/usr.ports
false /home/luis
false /home
true / <-- in the above code, this returns false, see ADDITIONAL NOTE.
false /var/run/cups
true /var/run
false /var
PORTABILITY NOTE
This should work in any un*x/linux that implements the stat(2) system call. UNIX v7 already implements this, so it's supposed to be found in all unices available.
ADDITIONAL NOTE
This approach does not work for the actual filesystem root (/) because the parent dir and the root dir both point to the same inode. This can be solved by checking the st_ino of both parent and child inodes to be equal. Just change the test for
return (sdir.st_dev != spdir.st_dev) /* different devices */
|| ( /* sdir.st_dev == spdir.st_dev && ---redundant */
sdir.st_ino == spdir.st_ino); /* root dir case */
POSIX has a realpath function that returns the canonical path of any given path (thus eliminating/resolving dots, dotdots, links, etc).
Now POSIX doesn't define volume, only the hierarchy of filenames from a unique given root /. Now Unix variants are able to mount file trees one on top of other to obtain a single tree at runtime. There is no standard way to "mount" these volumes, even if mount is a very common way, because there is plenty of ways to realize this feature. So you have to read the documentation of your OS variant to determine how you can obtain the list of all mounted points.
You may also read Linux function to get mount points and How to get mount point information from an API on Mac?.
In a terminal I can call ls -d */. Now I want a c program to do that for me, like this:
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
int main( void )
{
int status;
char *args[] = { "/bin/ls", "-l", NULL };
if ( fork() == 0 )
execv( args[0], args );
else
wait( &status );
return 0;
}
This will ls -l everything. However, when I am trying:
char *args[] = { "/bin/ls", "-d", "*/", NULL };
I will get a runtime error:
ls: */: No such file or directory
The lowest-level way to do this is with the same Linux system calls ls uses.
So look at the output of strace -efile,getdents ls:
execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
...
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 23 entries */, 32768) = 840
getdents(3, /* 0 entries */, 32768) = 0
...
getdents is a Linux-specific system call. The man page says that it's used under the hood by libc's readdir(3) POSIX API function.
The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. POSIX doesn't specify the exact system call interface, unlike for non-directory files.
These functions:
DIR *opendir(const char *name);
struct dirent *readdir(DIR *dirp);
can be used like this:
// print all directories, and symlinks to directories, in the CWD.
// like sh -c 'ls -1UF -d */' (single-column output, no sorting, append a / to dir names)
// tested and works on Linux, with / without working d_type
#define _GNU_SOURCE // includes _BSD_SOURCE for DT_UNKNOWN etc.
#include <dirent.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
DIR *dirhandle = opendir("."); // POSIX doesn't require this to be a plain file descriptor. Linux uses open(".", O_DIRECTORY); to implement this
//^Todo: error check
struct dirent *de;
while(de = readdir(dirhandle)) { // NULL means end of directory
_Bool is_dir;
#ifdef _DIRENT_HAVE_D_TYPE
if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
// don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
is_dir = (de->d_type == DT_DIR);
} else
#endif
{ // the only method if d_type isn't available,
// otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
struct stat stbuf;
// stat follows symlinks, lstat doesn't.
stat(de->d_name, &stbuf); // TODO: error check
is_dir = S_ISDIR(stbuf.st_mode);
}
if (is_dir) {
printf("%s/\n", de->d_name);
}
}
}
There's also a fully compilable example of reading directory entries and printing file info in the Linux stat(3posix) man page. (not the Linux stat(2) man page; it has a different example).
The man page for readdir(3) says the Linux declaration of struct dirent is:
struct dirent {
ino_t d_ino; /* inode number */
off_t d_off; /* not an offset; see NOTES */
unsigned short d_reclen; /* length of this record */
unsigned char d_type; /* type of file; not supported
by all filesystem types */
char d_name[256]; /* filename */
};
d_type is either DT_UNKNOWN, in which case you need to stat to learn anything about whether the directory entry is itself a directory. Or it can be DT_DIR or something else, in which case you can be sure it is or isn't a directory without having to stat it.
Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. This is a huge speedup for find -name: it doesn't have to stat anything to recurse through subdirs. But for filesystems that don't do this, d_type will always be DT_UNKNOWN, because filling it in would require reading all the inodes (which might not even be loaded from disk).
Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in d_type when it's not cheap. d_type is just a performance shortcut; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills in d_type, and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.)
Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run true | xargs --show-limits to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.
(When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)
Fortunately, there are several solutions. One is to use find instead:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");
You can also format the output as you wish, not depending on locale:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");
If you want to sort the output, use \0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \0 as the separator, too. tr will convert them to newlines for you:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");
If you want the names in an array, use glob() function instead.
Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:
#define _GNU_SOURCE
#include <stdio.h>
#include <ftw.h>
#define NUM_FDS 17
int myfunc(const char *path,
const struct stat *fileinfo,
int typeflag,
struct FTW *ftwinfo)
{
const char *file = path + ftwinfo->base;
const int depth = ftwinfo->level;
/* We are only interested in first-level directories.
Note that depth==0 is the directory itself specified as a parameter.
*/
if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
return 0;
/* Don't list names starting with a . */
if (file[0] != '.')
printf("%s/\n", path);
/* Do not recurse. */
return FTW_SKIP_SUBTREE;
}
and the nftw() call to use the above is obviously something like
if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
/* An error occurred. */
}
The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.
You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.
The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.
If you want to create a test directory with lots of subdirectories, use something like the following Bash:
mkdir lots-of-subdirs
cd lots-of-subdirs
for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done
On my system, running
ls -d */
in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.
You also cannot remove the directories using rmdir directory-*/ for the same reason. Use
find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir
instead. Or just remove the entire directory and subdirectories,
cd ..
rm -rf lots-of-subdirs
Just call system. Globs on Unixes are expanded by the shell. system will give you a shell.
You can avoid the whole fork-exec thing by doing the glob(3) yourself:
int ec;
glob_t gbuf;
if(0==(ec=glob("*/", 0, NULL, &gbuf))){
char **p = gbuf.gl_pathv;
if(p){
while(*p)
printf("%s\n", *p++);
}
}else{
/*handle glob error*/
}
You could pass the results to a spawned ls, but there's hardly a point in doing that.
(If you do want to do fork and exec, you should start with a template that does proper error checking -- each of those calls may fail.)
If you are looking for a simple way to get a list of folders into your program, I'd rather suggest the spawnless way, not calling an external program, and use the standard POSIX opendir/readdir functions.
It's almost as short as your program, but has several additional advantages:
you get to pick folders and files at will by checking the d_type
you can elect to early discard system entries and (semi)hidden entries by testing the first character of the name for a .
you can immediately print out the result, or store it in memory for later use
you can do additional operations on the list in memory, such as sorting and removing other entries that don't need to be included.
#include <stdio.h>
#include <sys/types.h>
#include <sys/dir.h>
int main( void )
{
DIR *dirp;
struct dirent *dp;
dirp = opendir(".");
while ((dp = readdir(dirp)) != NULL)
{
if (dp->d_type & DT_DIR)
{
/* exclude common system entries and (semi)hidden names */
if (dp->d_name[0] != '.')
printf ("%s\n", dp->d_name);
}
}
closedir(dirp);
return 0;
}
Another less low-level approach, with system():
#include <stdlib.h>
int main(void)
{
system("/bin/ls -d */");
return 0;
}
Notice with system(), you don't need to fork(). However, I recall that we should avoid using system() when possible!
As Nomimal Animal said, this will fail when the number of subdirectories is too big! See his answer for more...
My code is something like this:
DIR* pDir = opendir("/path/to/my/dir");
struct dirent pFile = NULL;
while ((pFile = readdir())) {
// Check if it is a .zip file
if (subrstr(pFile->d_name,".zip") {
// It is a .zip file, delete it, and the matching log file
char zipname[200];
snprintf(zipname, sizeof(zipname), "/path/to/my/dir/%s", pFile->d_name);
unlink(zipname);
char* logname = subsstr(zipname, 0, strlen(pFile->d_name)-4); // Strip of .zip
logname = appendstring(&logname, ".log"); // Append .log
unlink(logname);
}
closedir(pDir);
(this code is untested and purely an example)
The point is: Is it allowed to delete a file in a directory while looping through the directory with readdir()?
Or will readdir() still find the deleted .log file?
Quote from POSIX readdir:
If a file is removed from or added to
the directory after the most recent
call to opendir() or rewinddir(),
whether a subsequent call to readdir()
returns an entry for that file is
unspecified.
So, my guess is ... it depends.
It depends on the OS, on the time of day, on the relative order of the files added/deleted, ...
And, as a further point, between the time the readdir() function returns and you try to unlink() the file, some other process could have deleted that file and your unlink() fails.
Edit
I tested with this program:
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(void) {
struct dirent *de;
DIR *dd;
/* create files `one.zip` and `one.log` before entering the readdir() loop */
printf("creating `one.log` and `one.zip`\n");
system("touch one.log"); /* assume it worked */
system("touch one.zip"); /* assume it worked */
dd = opendir("."); /* assume it worked */
while ((de = readdir(dd)) != NULL) {
printf("found %s\n", de->d_name);
if (strstr(de->d_name, ".zip")) {
char logname[1200];
size_t i;
if (*de->d_name == 'o') {
/* create `two.zip` and `two.log` when the program finds `one.zip` */
printf("creating `two.zip` and `two.log`\n");
system("touch two.zip"); /* assume it worked */
system("touch two.log"); /* assume it worked */
}
printf("unlinking %s\n", de->d_name);
if (unlink(de->d_name)) perror("unlink");
strcpy(logname, de->d_name);
i = strlen(logname);
logname[i-3] = 'l';
logname[i-2] = 'o';
logname[i-1] = 'g';
printf("unlinking %s\n", logname);
if (unlink(logname)) perror("unlink");
}
}
closedir(dd); /* assume it worked */
return 0;
}
On my computer, readdir() finds deleted files and does not find files created between opendir() and readdir(). But it may be different on another computer; it may be different on my computer if I compile with different options; it may be different if I upgrade the kernel; ...
I'm testing my new Linux reference book. The Linux Programming Interface by Michael Kerrisk and it says the following:
SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from since the last since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.
I think that what is unspecified is what happens to dirents not yet scanned. Once an entry has been returned, it is 100% guaranteed that it will not be returned anymore whether or not you unlink the current dirent.
Also note the guarantee provided by the second sentence. Since you are leaving alone the other files and only unlinking the current entry for the zip file, SUSv3 guarantees that all the other files will be returned. What happens to the log file is undefined. it may or may not be returned by readdir() but in your case, it shouldn't be harmful.
The reason why I have explored the question it is to find an efficient way to close file descriptors in a child process before exec().
The suggested way in APUE from Stevens is to do the following:
int max_open = sysconf(_SC_OPEN_MAX);
for (int i = 0; i < max_open; ++i)
close(i);
but I am thinking using code similar to what is found in the OP to scan /dev/fd/ directory to know exactly which fds I need to close. (Special note to myself, skip over dirfd contained in the DIR handle.)
I found the following page describe the solution of this problem.
https://support.apple.com/kb/TA21420
I've been trying to get this code to work for hours! All I need to do is open a file to see if it is real and readable. I'm new to C so I'm sure there is something stupid I'm missing. Here is the code (shorthand, but copied):
#include <stdio.h>
main() {
char fpath[200];
char file = "/test/file.this";
sprintf(fpath,"~cs4352/projects/proj0%s",file);
FILE *fp = fopen(fpath,"r");
if(fp==NULL) {
printf("There is no file on the server");
exit(1);
}
fclose(fp);
//do more stuff
}
I have also verified that the path is correctly specifying a real file that I have read permissions to. Any other ideas?
Edit 1: I do know that the fpath ends up as "~cs4352/projects/proj0/test/file.this"
Edit 2: I have also tried the using the absolute file path. In both cases, I can verify that the paths are properly built via ls.
Edit 3: There errno is 2... I'm currently trying to track what that means in google.
Edit 4: Ok, errno of 2 is "There is no such file or directory". I am getting this when the reference path in fopen is "/home/courses1/cs4352/projects/proj0/index.html" which I verified does exist and I have read rights to it. As for the C code listed below, there may be a few semantic/newbie errors in it, but gcc does not give me any compile time warnings, and the code works exactly as it should except that it says that it keeps spitting errno of 2. In other words, I know that all the strings/char array are working properly, but the only thing that could be an issue is the fopen() call.
Solution: Ok, the access() procedure is what helped me the most (and what i am still using as it is less code, not to mention the more elegant way of doing it). The problem actually came from something that I didn't explain to you all (because I didn't see it until I used access()). To derrive the file, I was splitting strings using strtok() and was only splitting on " \n", but because this is a UNIX system, I needed to add "\r" to it as well. Once I fixed that, everything fell into place, and I'm sure that the fopen() function would work as well, but I have not tested it.
Thank you all for your helpful suggestions, and especially to Paul Beckingham for finding this wonderful solution.
Cheers!
The "~" is expanded by the shell, and is not expanded by fopen.
To test the existence and readability of a file, consider using the POSIX.1 "access" function:
#include <unistd.h>
if (access ("/path/to/file", F_OK | R_OK) == 0)
{
// file exists and is readable
}
First, file needs to be declared as char* or const char*, not simply char as you've written. But this might just be a typo, the compiler should at least give a warning there.
Secondly, use an absolute path (or a path relative to the current directory), not shell syntax with ~. The substitution of ~cs4352 by the respective home directory is usually done by the shell, but you are directly opening the file. So you are trying to open a file in a ~cs4352 subdirectory of your current working directory, which I guess is not what you want.
Other people have probably produced the equivalent (every modern shell, for example), but here's some code that will expand a filename with ~ or ~user notation.
#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif
#include <assert.h>
#include <limits.h>
#include <pwd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
char *relfname(const char *name, char *buffer, size_t bufsiz)
{
assert(name != 0 && buffer != 0 && bufsiz != 0);
if (name[0] != '~')
strncpy(buffer, name, bufsiz);
else
{
const char *copy;
struct passwd *usr = 0;
if (name[1] == '/' || name[1] == '\0')
{
usr = getpwuid(getuid());
copy = &name[1];
}
else
{
char username[PATH_MAX];
copy = strchr(name, '/');
if (copy == 0)
copy = name + strlen(name);
strncpy(username, &name[1], copy - &name[1]);
username[copy - &name[1]] = '\0';
usr = getpwnam(username);
}
if (usr == 0)
return(0);
snprintf(buffer, bufsiz, "%s%s", usr->pw_dir, copy);
}
buffer[bufsiz-1] = '\0';
return buffer;
}
#ifdef TEST
static struct { const char *name; int result; } files[] =
{
{ "/etc/passwd", 1 },
{ "~/.profile", 1 },
{ "~root/.profile", 1 },
{ "~nonexistent/.profile", 0 },
};
#define DIM(x) (sizeof(x)/sizeof(*(x)))
int main(void)
{
int i;
int fail = 0;
for (i = 0; i < DIM(files); i++)
{
char buffer[PATH_MAX];
char *name = relfname(files[i].name, buffer, sizeof(buffer));
if (name == 0 && files[i].result != 0)
{
fail++;
printf("!! FAIL !! %s\n", files[i].name);
}
else if (name != 0 && files[i].result == 0)
{
fail++;
printf("!! FAIL !! %s --> %s (unexpectedly)\n", files[i].name, name);
}
else if (name == 0)
printf("** PASS ** %s (no match)\n", files[i].name);
else
printf("** PASS ** %s -> %s\n", files[i].name, name);
}
return((fail == 0) ? EXIT_SUCCESS : EXIT_FAILURE);
}
#endif
You could try examining errno for more information on why you're not getting a valid FILE*.
BTW-- in unix the global value errno is set by some library and system calls when they need to return more information than just "it didn't work". It is only guaranteed to be good immediately after the relevant call.
char file = "/test/file.this";
You probably want
char *file = "/test/file.this";
Are you sure you do not mean
~/cs4352/projects/proj0%s"
for your home directory?
To sum up:
Use char *file=/test/file.this";
Don't expect fopen() to do shell substitution on ~ because it won't. Use the full path or use a relative path and make sure the current directory is approrpriate.
error 2 means the file wasn't found. It wasn't found because of item #2 on this list.
For extra credit, using sprintf() like this to write into a buffer that's allocated on the stack is a dangerous habit. Look up and use snprintf(), at the very least.
As someone else here mentioned, using access() would be a better way to do what you're attempting here.