In C, using POSIX calls, how can I determine if a path is inside a target directory?
For example, a web server has its root directory in /srv, this is getcwd() for the daemon.
When parsing a request for /index.html, it returns the contents of /srv/index.html.
How can I filter out requests for paths outside of /srv?
/../etc/passwd,
/valid/../../etc/passwd,
etc.
Splitting the path at / and rejecting any array containing .. will break valid accesses /srv/valid/../index.html.
Is there a canonical way to do this with system calls? Or do I need to manually walk the path and count directory depth?
There's always realpath:
The realpath() function shall derive, from the pathname pointed to by *file_name*, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links.
Then compare what realpath gives you with your desired root directory and see if they match up.
You could also clean up the filename by hand by expanding the double-dots before you prepend the "/srv". Split the incoming path on slashes and walk through it piece by piece. If you get a "." then remove it and move on; if you get a "..", then remove it and the previous component (taking care not go past the first entry in your list); if you get anything else, just move on to the next component. Then paste what's left back together with slashes between the components and prepend your "/srv/". So if someone gives you "/valid/../../etc/passwd", you'll end up with "/srv/etc/passwd" and "/where/is/../pancakes/house" will end up as "/srv/where/pancakes/house".
That way you can't get outside "/srv" (except through symbolic links of course) and an incoming "/../.." will be the same as "/" (just like in a normal file system). But you'd still want to use realpath if you're worried about symbolic under "/srv".
Working with the path name component by component would also allow you to break the connection between the layout you present to the outside world and the actual file system layout; there's no need for "/this/that/other/thing" to map to an actual "/srv/this/that/other/thing" file anywhere, the path could just be a key in some sort of database or some sort of namespace path to a function call.
To determine if a file F is within a directory D, first stat D to determine its device number and inode number (members st_dev and st_ino of struct stat).
Then stat F to determine if it is a directory. If not, call basename to determine the name of the directory containing it. Set G to the name of this directory. If F was already a directory, set G=F.
Now, F is within D if and only if G is within D. Next we have a loop.
while (1) {
if (samefile(d_statinfo.d_dev, d_statinfo.d_ino, G)) {
return 1; // F was within D
} else if (0 == strcmp("/", G) {
return 0; // F was not within D.
}
G = dirname(G);
}
The samefile function is simple:
int samefile(dev_t ddev, ino_t dino, const char *path) {
struct stat st;
if (0 == stat(path, &st)) {
return ddev == st.st_dev && dino == st.st_no;
} else {
throw ...; // or return error value (but also change the caller to detect it)
}
}
This will work on POSIX filesystems. But many filesystems are not POSIX. Problems to look out for include:
Filesystems where the device/inode are not unique. Some FUSE filesystems are examples of this; they sometimes make up inode numbers when the underlying filesystems don't have them. They shouldn't re-use inode numbers, but some FUSE filesystems have bugs.
Broken NFS implementations. On some systems all NFS filesystems have the same device number. If they pass through the inode number as it exists on the server, this could cause a problem (though I've never seen it happen in practice).
Linux bind mount points. If /a is a bind mount of /b, then /a/1 correctly appears to be inside /a, but with the implementation above, /b/1 also appears to be inside /a. I think that's probably the correct answer. However, if this is not the result you prefer, this is easily fixed by changing the return 1 case to call strcmp() to compare the path names too. However, for this to work you will need to start by calling realpath on both F and D. The realpath call can be quite expensive (since it may need to hit the disk a number of times).
The special path //foo/bar. POSIX allows path names beginning with // to be special in a way which is somewhat not well defined. Actually I forget the precise level of guarantee about semantics that POSIX provides. I think that POSIX allows //foo/bar and //baz/ugh to refer to the same file. The device/inode check should still do the right thing for you but you may find it does not (i.e. you may find that //foo/bar and //baz/ugh can refer to the same file but have different device/inode numbers).
This answer assumes that we start with an absolute path for both F and D. If this is not guaranteed you may need to do some conversion using realpath() and getcwd(). This will be a problem if the name of the current directory is longer than PATH_MAX (which can certainly happen).
You should simply process .. yourself and remove the previous path component when it's found, so that there are no occurrences of .. in the final string you use for opening files.
Related
What is the safest and most secure way in Go to validate on any platform that a given file path lies within a base path?
The paths are initially provided as strings and use "/" as separators, but they are user-supplied and I need to assume plenty of malicious inputs. Which kind of path normalization should I perform to ensure that e.g. sequences like ".." are evaluated, so I can safely check against the base path? What exceptions are there to watch out for on various file systems and platforms? Which Go libraries are supposed to be safe in that respect?
The results will be fed to external functions like os.Create and sqlite3.Open and any failure to recognize that the base path is left would be a security violation.
I believe you could use filepath.Rel for this (and check if it returns a value not starting with ..).
Rel returns a relative path that is lexically equivalent to targpath
when joined to basepath with an intervening separator. That is,
Join(basepath, Rel(basepath, targpath)) is equivalent to targpath
itself. On success, the returned path will always be relative to
basepath, even if basepath and targpath share no elements. An error is
returned if targpath can't be made relative to basepath or if knowing
the current working directory would be necessary to compute it. Rel
calls Clean on the result.
filepath.Rel also calls filepath.Clean on its input paths, resolving any .s and ..s.
Clean returns the shortest path name equivalent to path by purely
lexical processing. It applies the following rules iteratively until
no further processing can be done:
Replace multiple Separator elements with a single one.
Eliminate each . path name element (the current directory).
Eliminate each inner .. path name element (the parent directory) along with the non-.. element that precedes it.
Eliminate .. elements that begin a rooted path: that is, replace "/.." by "/" at the beginning of a path, assuming Separator is '/'.
You could also use filepath.Clean directly and check for prefix when it's done. Here are some sample outputs for filepath.Clean:
ps := []string{
"/a/../b",
"/a/b/./c/../../d",
"/b",
}
for _, p := range ps {
fmt.Println(p, filepath.Clean(p))
}
Prints:
/a/../b /b
/a/b/./c/../../d /a/d
/b /b
That said, path manipulation shouldn't be the only security mechanism you deploy. If you truly worry about exploits, use defense in depth by sandboxing, creating a virtual file system / containers, etc.
My goal is to, inside my C program, make a new directory. Within this directory, I want to create 15 simple text files. The part that I am stuck on is how to generate the 15 text files inside the newly created directory. I have created the new directory like this:
mkdir("new_dir", 0755);
But I am unsure of how to create a text file within it (in the same program). Any tips for this?
I am guessing you are on some POSIX system. The C11 standard (read n1570) does not know about directories (an abstraction provided by your operating system). If you are on Windows, it has a different WinAPI (you should then use CreateDirectory)
First, your call to mkdir(2) could fail (for a variety of reasons, including the fact that the directory did already exist). And very probably, you actually want to create the directory in the home directory, or document that you are creating it in the current working directory (e.g. leave the burden of some appropriate and prior cd shell builtin to your user). Practically speaking, the directory path should be computed at runtime as a string (perhaps using snprintf(3) or asprintf(3)).
So if you wanted to create a directory in the home directory of the user (remember that ~/foo/ is expanded by the shell during globbing, see glob(7)...; you need to fetch the home directory from environ(7)), you would code something like:
char pathbuf[256];
snprintf(pathbuf, sizeof(pathbuf), "%s/new_dir", getenv("HOME"));
to compute that string. Of course, you need to handle failure (of getenv(3), or of snprintf). I am leaving these checks as an exercise. You might want to keep the result of getenv("HOME") in some automatic variable.
Then you need to make the directory, and check against failure. At the very least (using perror(3) and see errno(3)):
if (mkdir (pathbuf, 0750)) { perror(pathbuf); exit(EXIT_FAILURE); }
BTW, the mode passed to mkdir might not allow every other user to write or access it (if it did, you could have some security vulnerability). So I prefer 0750 to yours 0755.
At last you need to create files inside it, perhaps using fopen(3) before writing into them. So some code like
int i = somenumber();
snprintf(pathbuf, sizeof(pathbuf),
"%s/new_dir/file%d.txt", getenv("HOME"), i);
FILE* f = fopen(pathbuf, "w");
if (!f) { perror(pathbuf); exit(EXIT_FAILURE); };
As Jonathan Leffler wisely commented, there are other ways.
My recommendation is to document some convention. Do you want your program to create a directory in the current working directory, or to create it in some "absolute" path, perhaps related to the home directory of your user? If your program is started by some user (and is not setuid or doesn't have root permissions, see credentials(7)) it is not permitted to create directories or files at arbitrary places (see hier(7)).
If on Linux, you'll better read some system programming book like ALP or newer. If you use a different OS, you should read the documentation of its system API.
I receive filesystem events from fanotify. Sometimes I want to get an absolute path to a file that's being accessed.
Usually, it's not a problem - fanotify_event_metadata contains a file descriptor fd, so I can call readlink on /proc/self/fd/<fd> and get my path.
However, if a path exceeds PATH_MAX readlink can no longer be used - it fails with ENAMETOOLONG. I'm wondering if there's a way to get a file path in this case.
Obviously, I can fstat the descriptor I get from a fanotify and traverse the entire filesystem looking for files with identical device ID and inode number. But this approach is not feasible for me performance-wise (even if I optimize it to ignore paths shorter than PATH_MAX).
I've tried getting a parent directory by reopening fd with O_PATH and calling openat(fd, "..", ...). Obviously, that failed because fd doesn't refer to a directory. I've also tried examining contents of a buffer after a failed readlink call (hoping it contains partial path). That didn't work either.
So far I've managed to get long paths for files inside the working directory of a process that opened them (fanotify events contain a pid of a target process, so I can read /proc/<pid>/cwd and get the path to the root from there). But that is a partial solution.
Is there a way to get an absolute path from a file descriptor without traversing the whole filesystem? Preferably the one that will work with kernel 2.6.32/glibc 2.11.
Update: For the curious. I've figured out why calling readlink("/proc/self/fd/<fd>", ... with a buffer large enough to store the entire path doesn't work.
Look at the implementation of do_proc_readlink. Notice that it doesn't use provided buffer directly. Instead, it allocates a single page and uses it as a temporary buffer when it calls d_path. In other words, no matter how large is buffer, d_path will always be limited to a size of a page. Which is 4096 bytes on amd64. Same as PATH_MAX! The -ENAMETOOLONG itself is returned by prepend when it runs out of mentioned page.
readlink can be used with a link target that's longer than PATH_MAX. There are two restrictions: the name of the link itself must be shorter than PATH_MAX (check, "/proc/self/fd/<fd>" is about 20 characters) and the provided output buffer must be large enough. You might want to call lstat first to figure out how big the output buffer should be, or just call readlink repeatedly with growing buffers.
the limitation of PATH_MAX births from the fact that the unix (or linux, from now) needs to bind the size of parameters passed to the kernel. There's no limit on how deep a file hierarchy can grow, and always there's the possibility to access all files, independent on how deep they are in the filesystem hierarchy. What is actually limited is the lenght of the string you can pass or receive from the kernel representing a file name. This means you cannot create (because you have to pass the target path) a symlink longer than this length, but you can have easily paths far longer this limit.
When you pass a filename to the kernel, you can do that for two reasons, to name a file (or device, or socket, or fifo, or whatever), to open it, etc. YOu do this and your filename goes first to a routine that converts that path into an inode (which is what the kernel manages actually). That routine begins scanning from two possible point in the filesystem hierarchi. Those points are the inode reference of the root inode and the inode reference of the curren working diretory of a process. The selection of which inode to use as departure inode depends on the presence of a leading / character at the begining of the path. From this point, up to PATH_MAX characters will be processed each time, but that can lead us deep enough that we cannot get to the root in one step only...
Suppose you use the path to change your current directory, and do a chdir A/B/C/D/E/.../Z. Once there, you create new directories and do the same thing, chdir AA/AB/AC/AD/AE/.../AZ, then chdir BA/BB/BC/BD/... and so on... there's nothing in the system that forbids you to get so deep in the filesystem (you can try that yourself, I have done and tested before) You can grow to a map that is by far larger than PATH_MAX. But this only mean that you cannot get there directly from the filesystem root. You can go there in steps, as much as the system allows you, and depending on where you fix you root directory (by means of the chroot(2) syscall) or your current directory (by means of the chdir(2) syscall)
probably you have notice (or not) that there's no system call to get your curren working directory path from root... There are several reasons for this:
root inode and curren working inode are two local-to-process concepts. Two processes in the same system can have different working directories, and also different root directories, up to the point that they are able to share nothing in common and no way from one's directory to reach the other.
inode path can be ambiguous. Well, this is not true for a directory, as it is not allowed two hard links to point to the same directory inode (this was possible in older unices, where directories had to be created with the mknod(2) system call, if you have access to some hp-ux v6 or old Unix SysV R4 you can create directories with a ... entry ---pointing to the granparent of a directory or similar things, just being root and knowing how to use the mknod(2) syscall) The idea is that when two links point to the same inode, which (or both) of then goes to the root, which one is the right path from the root inode to the current dir?
curren inode and root can be separated by a path far enough to not fit in the PATH_MAX limit.
there can be several different filesystems (and filesystem types) involved in getting to the root. So this is not something that can be obtained only knowing the stored data in the disks, you must know the mounting table.
For these reasons, there's no direct support in the kernel to know the root path to a file. And also there's no way to get the path (and this is what the pwd(1) command does) than to follow the .. entry and get to the parent directory and search there a link that gets to the inode number of the current dir... and repeat this until the parent inode is the same as the last inode visited. Only then you'll be in the root directory (your root directory, that is different in general of other processes root directories)
Just try this exercise:
i=0
while [ "$i" -lt 10000 ]
do
mkdir dir-$i
cd dir-$i
i=$(expr "$i" + 1)
done
and see how far you can go from the root directory in your hierarchy.
NOTE 1
Another reason to be impossible to get the path to a file from an open descriptor is that you have access only to the inode (the path you used to open(2) it can have no relationship to the actual root path, as you can use symlinks and relative to the working directory, or changed root dir in between the open call and the time you want to access the path, it can even not exist, as you can have unlink(2)d it) The inode information has no reference to the path to the inode, as there can be multiple (even millions) paths to a file. In the inode you have only a ref count, which means the number of paths that actually finish on that inode.
How can I search for a file within a directory and its sub-directories in C?
I'm not allowed to use find and I must use opendir , readdir and stat.
I want to perform something like the command ls -ln if the file indeed exists.
For traversing the directories, you will need: opendir(3), readdir(3), and closedir(3).
For checking the type of file (to see if it's a directory and if you should recursively search within it) you will need stat(2).
You will want to check
(struct stat).st_mode & S_IFDIR
to see if the file is a directory. See <sys/stat.h> for more information.
If we try to write a small piece of code in C then we can do this search activity easily.
Suppose you need to search abc.txt in a /home/Jack/ then just open a file stream and pass the file path as a parameter.
Now when this statement will be executed, it will try to open the existing file. This API will return non zero if the file exists otherwise it is returned -1 or zero.
You've already provided the basic answer: opendir/readdir/closedir. As you walk the directory entries, you check whether each refers to a file or a directory. Those that refer to directories, you traverse as well (typically recursively). For those that refer to files, you compare their names to the file(s) you're looking for, and see if you've found it.
One other minor detail: you probably also want to check for symbolic links. A symbolic link can (for example) refer to a parent directory, which could/can lead to infinite recursion. You may want to ignore symbolic links completely, or you may want to keep a list of directories you've already at least started to traverse, and resolve/traverse what's in the symbolic link only if it's not already in the list.
consider the following task :
1) read a target directory contents, pass each found dirent structure to some filter function and remember filtered elements somehow for the later processing
2) some time later, iterate through the filtered elements and process them (do some I/O)
The most obvious way is to save names of sub-directories.
However, I want to keep memory usage to the minimum and to avoid additional I/O.
According to POSIX manuals, I can save position of each directory entry using telldir() and restore them later using seekdir(). To keep these positions valid, I have to keep target directory opened and to not use rewinddir() call.
Keeping a directory stream open and storing a list of dir positions(long int`s) seems to be an appropriate solution.
However, it is unclear whether stored positions remain valid after folder modification. I didn`t found any comments on these conditions in the POSIX standard.
1) Whether stored positions remain valid when only new directory entries are added/removed ?
2) Whether stored positions of unmodified directory entries remain valid in case of some of the filtered directory entries were removed ?
3) Is it possible for the stored position to point to another directory entry after folder modification ?
It is easy to test and find out the answer on these questions for the particular system, but I would like to know what standards say on this topic
Thank you
Until you call rewinddir or close and reopen the directory, your view of the directory contents should not change. Sorry I don't have the reference handy. I'll find it later if you need it.