stat alternative for long file paths - c

I'm writing a program that iterates through a directory tree depth first (similar to the GNU find program) by recursively constructing paths to each file in the tree and stores the relative paths of encountered files. It also collects some statistics about these files. For this purpose I'm using the stat function.
I've notices that this fails for very deep directory hierarchies, i.e. long file paths, in accordance with stat's documentation.
Now my question is: what alternative approach could I use here that is guaranteed to work for paths of any length? (I don't need working code, just a rough outline would be sufficient).

As you are traversing, open each directory you traverse.
You can then get information about a file in that directory using fstatat. The fstatat function takes an additional parameter, dirfd. If you pass a handle to an open directory in that parameter, the path is interpreted as relative to that directory.
int fstatat(int dirfd, const char *pathname, struct stat *buf,
int flags);
The basic usage is:
int dirfd = open("directory path", O_RDONLY);
struct stat st;
int r = fstatat(dirfd, "relative file path", &st, 0);
You can, of course, also use openat instead of open, as you recurse. And the special value AT_FDCWD can be passed as dirfd to refer to the current working directory.
Caveats
It is easy to get into symlink loops and recurse forever. It is not uncommon to find symlink loops in practice. On my system, /usr/bin/X11 is a symlink to /usr/bin.
Alternatives
There are easier ways to traverse file hierarchies. Use ftw or fts instead, if you can.

Related

How to check if a symbolic link refers to a directory

I am currently recoding the "ls" command to learn. However, when I browse files: I may have an error when I try to open the "folder" of the path pointed by the symbolic link. Because it's not a directory (I thought all symbolic links pointed to folders).
How can I check if it points to a directory? (I watch the manuals, stat, dir ..)
I thought all symbolic links pointed to folders
Nope. A symbolic link is an indirect reference to another path. That other path can refer to any kind of file that can be represented in any mounted file system, or to no file at all (i.e. it can be a broken link).
How to check that it points to a directory?
You mention the stat() function, but for reimplementing ls you should mostly be using lstat(), instead. The difference is that when the specified path refers to a symbolic link, stat returns information about the link's target path, whereas lstat returns information about the link itself (including information about the file type, from which you can tell that it is a link).
In the event that you encounter a symbolic link, you can simply check the same path again with stat() to find out what kind of file it points to. stat() will recursively resolve symbolic links to discover the information for the ultimate target, which will be a symbolic link only if it is a broken one. Any way around, you don't need to distinguish between a broken link and any other form of non-directory for your particular purpose.
I just ran into the same problem, and here is my solution:
bool IsDir(const char *path)
{
std::string tmp = path;
tmp += '/';
struct stat statbuf;
return (lstat(tmp.c_str(), &statbuf) >= 0) && S_ISDIR(statbuf.st_mode);
}
the key is the tail / in the path
however, I have no idea whether it's portable

Is there a Linux C API call to query a mounted filesystem to see if it is read-only?

First, a little background information to provide some motivation for this question: I've got a program that runs on a headless Linux server and reads/writes files on several removable external hard drives, each of which is formatted with ext4 filesystem. Very occasionally, the filesystem metadata on one of these drives gets corrupted for whatever reason (ext4 journalling notwithstanding), which can cause the ext4 filesystem drive to detect a problem and remount the partition as read-only, presumably as a precaution against cascading errors corrupting the drive further.
Okay, fair enough; but what I'd like to do now is add a function to my program that can detect when the drive is in this remounted-read-only state, so that it can pro-actively notify the user that his drive is in trouble.
My question is, what is an elegant/supported way to query a filesystem to find out if it is mounted read-only?
Attempting to write a file to the filesystem isn't good enough, because that could fail for other reasons, and also because I don't want to write to the filesystem if I don't have to.
My program could fopen("/proc/mounts", "r") and parse the lines of text that it generates (grepping for the "rw," token on the line corresponding to my partition), and I will if I have to, but that solution seems a bit hacky (too much like screen-scraping, liable to break if the text format ever changes).
So, is there some lightweight/purpose-built Linux system call that I could use that would tell me whether a given filesystem mount point (e.g. "/dev/sda1") is currently mounted read-only? It seems like stat() might be able to do it, but I can't see how.
The getmntent() family should meet your needs.
NAME
getmntent, setmntent, addmntent, endmntent, hasmntopt, getmntent_r -
get filesystem descriptor file entry
SYNOPSIS
#include <stdio.h>
#include <mntent.h>
FILE *setmntent(const char *filename, const char *type);
struct mntent *getmntent(FILE *stream);
int addmntent(FILE *stream, const struct mntent *mnt);
int endmntent(FILE *streamp);
char *hasmntopt(const struct mntent *mnt, const char *opt);
/* GNU extension */
#include <mntent.h>
struct mntent *getmntent_r(FILE *streamp, struct mntent *mntbuf,
char *buf, int buflen);
DESCRIPTION
These routines are used to access the filesystem description file
/etc/fstab and the mounted filesystem description file /etc/mtab.
The setmntent() function opens the filesystem description file
filename and returns a file pointer which can be used by getmntent().
The argument type is the type of access required and can take the same
values as the mode argument of fopen(3).
The getmntent() function reads the next line of the filesystem
description file from stream and returns a pointer to a structure
containing the broken out fields from a line in the file. The pointer
points to a static area of memory which is overwritten by subsequent
calls to getmntent().
The addmntent() function adds the mntent structure mnt to the end of
the open stream.
The endmntent() function closes the stream associated with the
filesystem description file.
The hasmntopt() function scans the mnt_opts field (see below) of the
mntent structure mnt for a substring that matches opt. See
and mount(8) for valid mount options.
The reentrant getmntent_r() function is similar to getmntent(), but
stores the struct mount in the provided *mntbuf and stores the strings
pointed to by the entries in that struct in the provided array buf of
size buflen.
The mntent structure is defined in as follows:
struct mntent {
char *mnt_fsname; /* name of mounted filesystem */
char *mnt_dir; /* filesystem path prefix */
char *mnt_type; /* mount type (see mntent.h) */
char *mnt_opts; /* mount options (see mntent.h) */
int mnt_freq; /* dump frequency in days */
int mnt_passno; /* pass number on parallel fsck */
};
...
The easiest way to check that the filesystem of an open file for writing has become mounted read-only is to check the errno variable for EROFS error.
If you don't have the possibility of having a writable directory or file in that filesystem, you cannot get a portable way of checking if the filesystem has become read only (more if it has become so due to device errors)
Another way is to ask the administrator to check, or try to read the /proc/mounts file yourself. But this is linux specific only.

Ext2 - how is a file created

How does the process of creating a file in ext2 file system look like?
I am trying to make a simple syscall which takes a path and creates given file - like touch.
For example, the code:
int main(void)
{
syscall(MY_SYSCALL_NUMBER, "/tmp/file");
}
Should create a file called "file" in /tmp.
Now how should the syscall itself work?
My work so far (I ommited error checking for readibility here):
asmlinkage long sys_ccp(const char __user *arg)
{
struct path path;
struct inode *new_inode;
struct qstring qname;
//ommited copy from user for simplicity
qname.name = arg;
qname.len = length(arg);
kern_path(src, LOOKUP_FOLLOW, &path);
new_inode = ext2_new_inode(path.dentry->d_parent->d_inode, S_IFREG, &qname);
}
This seems to work (I can see in logs that an inode is allocated), however, when I call ls on the directory I can't see the file there.
My idea was to add the new inode to struct dentry of directory, so I added this code:
struct dentry *new_dentry;
new_dentry = d_alloc(path.dentry->d_parent, &qname);
d_instantiate(new_dentry, new_inode);
However, this still doesn't seem to work (I can't see the file using ls).
How to implement this syscall correctly, what am I missing?
EDIT:
Regarding R.. answer - purpuse of this syscall is to play around with ext2 and learn about its design, so we can assumie that path is always valid, the filesystem is indeed ext2 and so on.
You're completely mixing up the abstraction layers involved. If something like your code could even work at all (not sure if it can), it would blow up badly and crash the kernel or lead to runaway wrong code execution if someone happened to make this syscall on a path that didn't actually correspond to an ext2 filesystem.
In the kernel's fs abstraction, the fact that the underlying filesystem is ext2 (or whatever it is) is irrelevant to the task of making a file on it. Rather all of this has to go through fs-type-agnostic layers which in turn end up using the fs-type-specific backends for the fs mounted at the path.

stat() giving wrong directory size in c

I need to find the size of a file or a directory whatever given in the commandline using stat(). It works fine for the files (both relative and absolute paths) but when I give a directory, it always returns the size as 512 or 1024.
If I print the files in the directory it goes as follows :
Name : .
Name : ..
Name : new
Name : new.c
but only the new and new.c files are actually in there. For this, the size is returned as 512 even if I place more files in the directory.
Here s my code fragment:
if (stat(request.data,&st)>=0){
request.msgType = (short)0xfe21;
printf("\n Size : %ld\n",st.st_size);
sprintf(reply.data,"%ld",st.st_size);
reply.dataLen = strlen(reply.data);
}
else{
perror("\n Stat()");
}
}
Where did I go wrong???
here is my request, reply structure:
struct message{
unsigned short msgType;
unsigned int offset;
unsigned int serverDelay;
unsigned int dataLen;
char data[100];
};
struct message request,reply;
I run it in gcc compiler in unix os.
stat() on a directory doesn't return the sum of the file sizes in it. The size field represents how much space it taken by the directory entry instead, and it varies depending on a few factors. If you want to know how much space is taken by all files below a specific directory, then you have to recurse down the tree, adding up the space taken by all files. This is how tools like du work.
Yes. opendir() + loop on readdir()/stat() will give you the file/directory sizes which you can sum to get a total. If you have sub-directories you will also have to loop on those and the files within them.
To use du you could use the system() function. This only returns a result code to the calling program so you could save the results to a file and then read the file. The code would be something like,
system("du -sb dirname > du_res_file");
Then you can read the file du_res_file (assuming it has been created successfully) to get your answer. This would give the size of the directory + sub-directories + files in one go.
Im sorry, I missed it the first time, stat only gives the size of files, not directories:
These functions return information about a file. No permissions are required on the file itself, but-in the case of stat() and lstat() - execute (search) permission is required on all of the directories in path that lead to the file.
The st_size field gives the size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symbolic link is the length of the pathname it contains, without a terminating null byte.
look at the man page on fstat/stat

Efficiently Traverse Directory Tree with opendir(), readdir() and closedir()

The C routines opendir(), readdir() and closedir() provide a way for me to traverse a directory structure. However, each dirent structure returned by readdir() does not seem to provide a useful way for me to obtain the set of pointers to DIR that I would need to recurse into the directory subdirectories.
Of course, they give me the name of the files, so I could either append that name to the directory path and stat() and opendir() them, or I could change the current working directory of the process via chdir() and roll it back via chdir("..").
The problem with the first approach is that if the length of the directory path is great enough, then the cost to pass a string containing it to opendir() will overweight the cost of opening a directory. If you are a bit more theoretical, you could say your complexity could increase beyond linear time (in the total character count of the (relative) filenames in the directory tree).
Also, the second approach has a problem. Since each process has a single current working directory, all but one thread will have to block in a multithreaded application. Also, I don't know if the current working directory is just a mere convenience (i.e., the relative path will be appended to it prior to a filesystem query). If it is, this approach will be inefficient too.
I am accepting alternatives to these functions. So how is it one can traverse a UNIX directory tree efficiently (linear time in the total character count of the files under it)?
Have you tried ftw() aka File Tree Walk ?
Snippit from man 3 ftw:
int ftw(const char *dir, int (*fn)(const char *file, const struct stat *sb, int flag), int nopenfd);
ftw() walks through the directory tree starting from the indicated directory dir. For each found entry in the tree, it calls fn() with the full pathname of the entry, a pointer to the stat(2) structure for the entry and an int flag
You seem to be missing one basic point: directory traversal involves reading data from the disk. Even when/if that data is in the cache, you end up going through a fair amount of code to get it from the cache into your process. Paths are also generally pretty short -- any more than a couple hundred bytes is pretty unusual. Together these mean that you can pretty reasonably build up strings for all the paths you need without any real problem. The time spent building the strings is still pretty minor compared to the time to read data from the disk. That means you can normally ignore the time spent on string manipulation, and work exclusively at optimizing disk usage.
My own experience has been that for most directory traversal a breadth-first search is usually preferable -- as you're traversing the current directory, put the full paths to all sub-directories in something like a priority queue. When you're finished traversing the current directory, pull the first item from the queue and traverse it, continuing until the queue is empty. This generally improves cache locality, so it reduces the amount of time spent reading the disk. Depending on the system (disk speed vs. CPU speed, total memory available, etc.) it's nearly always at least as fast as a depth-first traversal, and can easily be up to twice as fast (or so).
The way to use opendir/readdir/closedir is to make the function recursive! Have a look at the snippet here on Dreamincode.net.
Hope this helps.
EDIT Thanks R.Sahu, the linky has expired, however, found it via wayback archive and took the liberty to add it to gist. Please remember, to check the license accordingly and attribute the original author for the source! :)
Probably overkill for your application, but here's a library designed to traverse a directory tree with hundreds of millions of files.
https://github.com/hpc/libcircle
Instead of opendir(), you can use a combination of openat(), dirfd() and fdopendir() and construct a recursive function to walk a directory tree:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <dirent.h>
void
dir_recurse (DIR *parent, int level)
{
struct dirent *ent;
DIR *child;
int fd;
while ((ent = readdir(parent)) != NULL) {
if ((strcmp(ent->d_name, ".") == 0) ||
(strcmp(ent->d_name, "..") == 0)) {
continue;
}
if (ent->d_type == DT_DIR) {
printf("%*s%s/\n", level, "", ent->d_name);
fd = openat(dirfd(parent), ent->d_name, O_RDONLY | O_DIRECTORY);
if (fd != -1) {
child = fdopendir(fd);
dir_recurse(child, level + 1);
closedir(child);
} else {
perror("open");
}
} else {
printf("%*s%s\n", level, "", ent->d_name);
}
}
}
int
main (int argc, char *argv)
{
DIR *root;
root = opendir(".");
dir_recurse(root, 0);
closedir(root);
return 0;
}
Here readdir() is still used to get the next directory entry. If the next entry is a directory, then we find the parent directory fd with dirfd() and pass this, along with the child directory name to openat(). The resulting fd refers to the child directory. This is passed to fdopendir() which returns a DIR * pointer for the child directory, which can then be passed to our dir_recurse() where it again will be valid for use with readdir() calls.
This program recurses over the whole directory tree rooted at .. Entries are printed, indented by 1 space per directory level. Directories are printed with a trailing /.
On ideone.

Resources