Check if file is *child of folder - c

I have a directory name and a subpath, e.g. "./files" and "/example.txt". While the directory can be arbitrarily placed in the filesystem, I need to make sure that directory+subpath ("./files/example.txt") actually is inside the given directory. So this example would be valid, while subpath "/../example.txt" would be invalid because it is neither a child of the directory, nor a grandchild, etc. Soft-links leading outside of the directory are allowed.
How should I perform this test in C?
My first guess was to use realpath(directory_subpath) and comparing the start of the result with realpath(directory), but after reading about the problems with PATH_MAX I'm a bit unsure about that, and this is also likely to cause problems with soft-links.
My second idea is simply checking if the subpath starts with /../ and if is, resulting in invalid. If /../ exists anywhere else in the subpath, the directory name before that will be removed (from left-to-right, repeating this until the path turns out to be invalid or the end of the path name is reached).
The subpath might be given with malicious intent, so I want to be really, really sure about this. Is my second approach safe? Is there a different, better way?

The second approach is safe if you check for /.. (without trailing slash).
I would just forbid .. in the subpath: the cases when .. is really necessary and is not malicious are rather rare.

Related

What is the safest way to check that a file resides within a base directory?

What is the safest and most secure way in Go to validate on any platform that a given file path lies within a base path?
The paths are initially provided as strings and use "/" as separators, but they are user-supplied and I need to assume plenty of malicious inputs. Which kind of path normalization should I perform to ensure that e.g. sequences like ".." are evaluated, so I can safely check against the base path? What exceptions are there to watch out for on various file systems and platforms? Which Go libraries are supposed to be safe in that respect?
The results will be fed to external functions like os.Create and sqlite3.Open and any failure to recognize that the base path is left would be a security violation.
I believe you could use filepath.Rel for this (and check if it returns a value not starting with ..).
Rel returns a relative path that is lexically equivalent to targpath
when joined to basepath with an intervening separator. That is,
Join(basepath, Rel(basepath, targpath)) is equivalent to targpath
itself. On success, the returned path will always be relative to
basepath, even if basepath and targpath share no elements. An error is
returned if targpath can't be made relative to basepath or if knowing
the current working directory would be necessary to compute it. Rel
calls Clean on the result.
filepath.Rel also calls filepath.Clean on its input paths, resolving any .s and ..s.
Clean returns the shortest path name equivalent to path by purely
lexical processing. It applies the following rules iteratively until
no further processing can be done:
Replace multiple Separator elements with a single one.
Eliminate each . path name element (the current directory).
Eliminate each inner .. path name element (the parent directory) along with the non-.. element that precedes it.
Eliminate .. elements that begin a rooted path: that is, replace "/.." by "/" at the beginning of a path, assuming Separator is '/'.
You could also use filepath.Clean directly and check for prefix when it's done. Here are some sample outputs for filepath.Clean:
ps := []string{
"/a/../b",
"/a/b/./c/../../d",
"/b",
}
for _, p := range ps {
fmt.Println(p, filepath.Clean(p))
}
Prints:
/a/../b /b
/a/b/./c/../../d /a/d
/b /b
That said, path manipulation shouldn't be the only security mechanism you deploy. If you truly worry about exploits, use defense in depth by sandboxing, creating a virtual file system / containers, etc.

How to determine if a path is inside a directory? (POSIX)

In C, using POSIX calls, how can I determine if a path is inside a target directory?
For example, a web server has its root directory in /srv, this is getcwd() for the daemon.
When parsing a request for /index.html, it returns the contents of /srv/index.html.
How can I filter out requests for paths outside of /srv?
/../etc/passwd,
/valid/../../etc/passwd,
etc.
Splitting the path at / and rejecting any array containing .. will break valid accesses /srv/valid/../index.html.
Is there a canonical way to do this with system calls? Or do I need to manually walk the path and count directory depth?
There's always realpath:
The realpath() function shall derive, from the pathname pointed to by *file_name*, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links.
Then compare what realpath gives you with your desired root directory and see if they match up.
You could also clean up the filename by hand by expanding the double-dots before you prepend the "/srv". Split the incoming path on slashes and walk through it piece by piece. If you get a "." then remove it and move on; if you get a "..", then remove it and the previous component (taking care not go past the first entry in your list); if you get anything else, just move on to the next component. Then paste what's left back together with slashes between the components and prepend your "/srv/". So if someone gives you "/valid/../../etc/passwd", you'll end up with "/srv/etc/passwd" and "/where/is/../pancakes/house" will end up as "/srv/where/pancakes/house".
That way you can't get outside "/srv" (except through symbolic links of course) and an incoming "/../.." will be the same as "/" (just like in a normal file system). But you'd still want to use realpath if you're worried about symbolic under "/srv".
Working with the path name component by component would also allow you to break the connection between the layout you present to the outside world and the actual file system layout; there's no need for "/this/that/other/thing" to map to an actual "/srv/this/that/other/thing" file anywhere, the path could just be a key in some sort of database or some sort of namespace path to a function call.
To determine if a file F is within a directory D, first stat D to determine its device number and inode number (members st_dev and st_ino of struct stat).
Then stat F to determine if it is a directory. If not, call basename to determine the name of the directory containing it. Set G to the name of this directory. If F was already a directory, set G=F.
Now, F is within D if and only if G is within D. Next we have a loop.
while (1) {
if (samefile(d_statinfo.d_dev, d_statinfo.d_ino, G)) {
return 1; // F was within D
} else if (0 == strcmp("/", G) {
return 0; // F was not within D.
}
G = dirname(G);
}
The samefile function is simple:
int samefile(dev_t ddev, ino_t dino, const char *path) {
struct stat st;
if (0 == stat(path, &st)) {
return ddev == st.st_dev && dino == st.st_no;
} else {
throw ...; // or return error value (but also change the caller to detect it)
}
}
This will work on POSIX filesystems. But many filesystems are not POSIX. Problems to look out for include:
Filesystems where the device/inode are not unique. Some FUSE filesystems are examples of this; they sometimes make up inode numbers when the underlying filesystems don't have them. They shouldn't re-use inode numbers, but some FUSE filesystems have bugs.
Broken NFS implementations. On some systems all NFS filesystems have the same device number. If they pass through the inode number as it exists on the server, this could cause a problem (though I've never seen it happen in practice).
Linux bind mount points. If /a is a bind mount of /b, then /a/1 correctly appears to be inside /a, but with the implementation above, /b/1 also appears to be inside /a. I think that's probably the correct answer. However, if this is not the result you prefer, this is easily fixed by changing the return 1 case to call strcmp() to compare the path names too. However, for this to work you will need to start by calling realpath on both F and D. The realpath call can be quite expensive (since it may need to hit the disk a number of times).
The special path //foo/bar. POSIX allows path names beginning with // to be special in a way which is somewhat not well defined. Actually I forget the precise level of guarantee about semantics that POSIX provides. I think that POSIX allows //foo/bar and //baz/ugh to refer to the same file. The device/inode check should still do the right thing for you but you may find it does not (i.e. you may find that //foo/bar and //baz/ugh can refer to the same file but have different device/inode numbers).
This answer assumes that we start with an absolute path for both F and D. If this is not guaranteed you may need to do some conversion using realpath() and getcwd(). This will be a problem if the name of the current directory is longer than PATH_MAX (which can certainly happen).
You should simply process .. yourself and remove the previous path component when it's found, so that there are no occurrences of .. in the final string you use for opening files.

What corner cases must we consider when parsing $PATH on Linux?

I'm working on a C application that has to walk $PATH to find full pathnames for binaries, and the only allowed dependency is glibc (i.e. no calling external programs like which). In the normal case, this just entails splitting getenv("PATH") by colons and checking each directory one by one, but I want to be sure I cover all of the possible corner cases. What gotchas should I look out for? In particular, are relative paths, paths starting with ~ meant to be expanded to $HOME, or paths containing the : char allowed?
One thing that once surprised me is that the empty string in PATH means the current directory. Two adjacent colons or a colon at the end or beginning of PATH means the current directory is included. This is documented in man bash for instance.
It also is in the POSIX specification.
So
PATH=:/bin
PATH=/bin:
PATH=/bin::/usr/bin
All mean the current directory is in PATH
I'm not sure this is a problem with Linux in general, but make sure that your code works if PATH has some funky (like, UTF-8) encoding to deal with directories with fancy letters. I suspect this might depend on the filesystem encoding.
I remember working on a bug report of some russian guy who had fancy letters in his user name (and hence, his home directory name which appeared in PATH).
This is minor but I'll added it since it hasn't already been mentioned. $PATH can include both absolute and relative paths. If your crawling the paths list by chdir(2)ing into each directory, you need to keep track of the original working directory (getcwd(3)) and chdir(2) back to it at each iteration of the crawl.
The existing answers cover most of it, but it's worth covering parts of the question that wasn't answered yet:
$ and ~ are not special in the value of $PATH.
If $PATH is not set at all, execvp() will use a default value.

How to get the parent directory of the current folder in a C program?

I am trying to get the parent directory of the current folder in which i have the program.
I need to include in the C program I have. I tried doing it through string methods and solve it, but I feel there can be a better and simpler way. Eg: If his path is “C:\Application\Config”, then I want to get - “C:\Application” the just parent path.
Can some one please help me with this?
Thanks,
Priyanka
To in-place truncate a string at its last backslash:
char pathname[MAX_PATH];
GetCurrentDirectory(MAX_PATH, pathname);
char* last_backslash = strrchr(pathname, '\\');
if (last_backslash)
{
*last_backslash = '\0';
}
Sometimes just adding \.. will suffice if you are not afraid by MAX_PATH.
It's difficult to answer your question since you haven't really specified what you want to -do- with the path once you have it. If you want to change to the new directory, that's easy, you just use whatever function you'd normally use to change directory but pass it ".." instead of a full path - that's because on all sane filesystems, ".." is a 'magic' directory which exists inside all other directories and refers to the parent thereof.
If you want to perform some string function on the new directory before jumping to it, your problem instantly becomes a lot more difficult to solve. The way I'd go about doing it mirrors RichieHindle's solution - strip the current directory away from the full path then you're left with the parent directory's path with which you can muck about to your heart's content.
In Windows OS, the API function you need is called GetCurrentDirectory().
http://msdn.microsoft.com/en-us/library/aa364934%28v=vs.85%29.aspx

What can I do if getcwd() and getenv("PWD") don't match?

I have a build system tool that is using getcwd() to get the current working directory. That's great, except that sometimes people have spaces in their paths, which isn't supported by the build system. You'd think that you could just make a symbolic link:
ln -s "Directory With Spaces" DirectoryWithoutSpaces
And then be happy. But unfortunately for me, getcwd() resolves all the symbolic links. I tried to use getenv("PWD"), but it is not pointing at the same path as I get back from getcwd(). I blame make -C for not updating the environment variable, I think. Right now, getcwd() gives me back a path like this:
/Users/carl/Directory With Spaces/Some/Other/Directories
And getenv("PWD") gives me:
/Users/carl/DirectoryWithoutSpaces
So - is there any function like getcwd() that doesn't resolve the symbolic links?
Edit:
I changed
make -C Some/Other/Directories
to
cd Some/Other/Directories ; make
And then getenv("PWD") works.. If there's no other solution, I can use that.
According to the Advanced Programming in the UNIX Environment bible by Stevens, p.112:
Since the kernel must maintain knowledge of the current working directory, we should be able to fetch its current value. Unfortunately, all the kernel maintains for each process is the i-node number and device identification for the current working directory. The kernel does not maintain the full pathname of the directory.
Sorry, looks like you do need to work around this in another way.
There is no way for getcwd() to determine the path you followed via symbolic links. The basic implementation of getcwd() stats the current directory '.', and then opens the parent directory '..' and scans the entries until it finds the directory name with the same inode number as '.' has. It then repeats the process upwards until it finds the root directory, at which point it has the full path. At no point does it ever traverse a symbolic link. So the goal of having getcwd() calculate the path followed via symlinks is impossible, whether it is implemented as a system call or as a library function.
The best resolution is to ensure that the build system handles path names containing spaces. That means quoting pathnames passed through the shell. C programs don't care about the spaces in the name; it is only when a program like the shell interprets the strings that you run into problems. (Compilers implemented as shell scripts that run pre-processors often have problems with pathnames that contain spaces - speaking from experience.)

Resources