What can I do if getcwd() and getenv("PWD") don't match? - c

I have a build system tool that is using getcwd() to get the current working directory. That's great, except that sometimes people have spaces in their paths, which isn't supported by the build system. You'd think that you could just make a symbolic link:
ln -s "Directory With Spaces" DirectoryWithoutSpaces
And then be happy. But unfortunately for me, getcwd() resolves all the symbolic links. I tried to use getenv("PWD"), but it is not pointing at the same path as I get back from getcwd(). I blame make -C for not updating the environment variable, I think. Right now, getcwd() gives me back a path like this:
/Users/carl/Directory With Spaces/Some/Other/Directories
And getenv("PWD") gives me:
/Users/carl/DirectoryWithoutSpaces
So - is there any function like getcwd() that doesn't resolve the symbolic links?
Edit:
I changed
make -C Some/Other/Directories
to
cd Some/Other/Directories ; make
And then getenv("PWD") works.. If there's no other solution, I can use that.

According to the Advanced Programming in the UNIX Environment bible by Stevens, p.112:
Since the kernel must maintain knowledge of the current working directory, we should be able to fetch its current value. Unfortunately, all the kernel maintains for each process is the i-node number and device identification for the current working directory. The kernel does not maintain the full pathname of the directory.
Sorry, looks like you do need to work around this in another way.

There is no way for getcwd() to determine the path you followed via symbolic links. The basic implementation of getcwd() stats the current directory '.', and then opens the parent directory '..' and scans the entries until it finds the directory name with the same inode number as '.' has. It then repeats the process upwards until it finds the root directory, at which point it has the full path. At no point does it ever traverse a symbolic link. So the goal of having getcwd() calculate the path followed via symlinks is impossible, whether it is implemented as a system call or as a library function.
The best resolution is to ensure that the build system handles path names containing spaces. That means quoting pathnames passed through the shell. C programs don't care about the spaces in the name; it is only when a program like the shell interprets the strings that you run into problems. (Compilers implemented as shell scripts that run pre-processors often have problems with pathnames that contain spaces - speaking from experience.)

Related

Copying a directory using sockets

I'm writing a program in C that sends files across the network using sockets. This works fine for files - they are read into a buffer and then written onto the socket. They are picked up at the other end by reversing this process.
However, how can this apply to directories? I also want to copy directories, keeping the permissions the same (so I don't think mkdir will work). At the moment when I try to run this on a directory, it says the size is -1. How is a directory represented?
To be clear, for example, if I want my program to copy /tmp across the network, it will do this:
/tmp/1.txt - OK
/tmp/2.txt - OK
/tmp/dir/ - Skip
/tmp/dir/3.txt - Can't write to path
There are several possibilities. It would fit fairly will with what you have already to tar the directory to transfer, send the resulting archive across the network, and untar on the other side.
Alternatively, you can walk the directory tree recursively. For each directory you need transfer only the name and whichever attributes you want to preserve, but then you must list the directory contents (probably via readdir()) and transfer each member.
By the way, don't neglect to think about how you're going to handle links, both symbolic ones and hard ones. And if you want your program to be really robust then consider also what to do with special files such as device files and FIFOs.
I guess it is homework, otherwise why not use FTP, scp, rsync, unison etc.
To test if a file path is a plain file, a device, a directory, etc etc... use
stat(2)
To read a directory, use opendir(3) then loop on readdir(3) (then of course closedir). You don't need to know how a directory is represented.
You probably should be interested in nftw(3) to recursively traverse a file tree.
To make one directory, use mkdir(2)
You should read Advanced Linux Programming
BTW, this answer contains useful information too...

own shell in C using execv

I am trying to build my own shell in C as part of a class project. We are required to use execv and implement our own path. For better understanding here is the question:
The list of paths is empty by default, but may grow to any arbitrary size. You should implement a built-in command to control this variable:
path [+|- /some/dir]
path (without arguments) displays all the entries in the list separated by colons, e.g. "/bin:/usr/bin".
path + /some/dir appends the given pathname to the path list.
path - /some/dir removes the given pathname from the path list.
I have misread the assignment and used execvp so far. Please can you shed some light on how to create my own path variable, and for each command executed search the directory it is in and add it to the path? Or is there any simple shell written using execv I can take a look at?
I saw http://linuxgazette.net/111/ramankutty.html, but I found the search a little too complex, and he uses execve.
so far i have char *mypath variable which is null initially. but the user can add or remove using path + some/dir or path - /some/dir. syntax for execv is execv("/some/dir", argv) how do i search my path for the executable and pass it to execv....for example mypath=/bin/ls ; when i pass execv(mypath, argv) it does not work...so how do i pass the path to execv?
I'm guessing the reason you are supposed to use excev is precisely that it doesn't take into account the path of the environment, but the call has to provide a full path to the function.
Since this is a class project, you are supposed to write your code - writing code is how you learn how to do things, much more than copy-and-paste from the internet, so I'm not going to write code to solve the problem but instead describe the solution.
You will need to keep a list of path entries - adjusted through the path + some/dir and path - some/dir mechanism - so these commands need to be handled inside your shell, of course, and they should add/remove from your list of path entries.
When you then come to executing something, say "mycommand" is entered, you will have to scan the list of path entries, and check if there is a file by the name "mycommand" in the directory specified by the path entry that can be executed (has execute bit set in the directory entry). If so, call execv on the string of current path entry and "mycommand" concatenated. (You can produce the concatenated string and use the stat function to get the information about the file, for example)
Do check for errors, and report if something goes wrong.
Please do not try to find someone else's shell on the internet. That is not how you learn, and if you don't actually learn from the class exercises, you will most likely not succeed once you finish school - and that's ultimately WHY you are going to school, right?

How to find your directory manually in Linux?

I am about to write a command that shows the current directory in linux. I know I can use the "pwd" command, but that's what I need to implement myself!... in other words, when entering the so called "findme" command, I want to return back the directory that I am in at the moment. I have managed to create the my "findme" command (which is very simple,I know), but how am I supposed to know where in I am after executing the command, in order to show the whole directory?
It seems a rather odd requirement: 'implement pwd' (this isn't homework, is it?). Can you give a little more context?
Probably relevant bits of information are:
the directory .. is linked to the parent directory, and . to the current one; so...
changing directory to .. goes up one level in the filesystem (unless you're at the top); plus
every directory will have an 'inode number', so if you consider a directory foo, then both it and the directory foo/. will have the same inode number.
I don't know how pwd actually does it, but I'd lay money you could reimplement it with this information.
XV6 isn't at all the same as GNU/Linux. pwd was not implemented in UNIX V6, that's why it's a good exercise. You would probably want to implement getcwd() as a syscall.

Prevent accessing files outside of given working directory

I am trying to prevent the access on files outside of a given working directory.
My first attempt was to use chdir and chroot, but chroot can only be used by root users.
Is there any other possibility? I have heard something about another one, but I can't remember.
Perhaps a simple function to check if the path is outside of the working directory or second argument.
Some details about the program:
shall be run on Linux
simple shell programm without any interactive elements
takes a directory argument, which is the working directory
Thanks for any advices.
EDIT:
After some research I found different aproachments, but I can't use any of them.
pivot_root
set_fs_root (linux kernel)
Is there any possibility to use that?
Perhaps there is a possibility to open a file which is contained by a given directory. So I call the function with the argument file path and the "root" path where to look.
I'm assuming that you're on a Linux/MacOSX platform. There are a couple of ways. One is to create a special user for your program who owns that directory, but doesn't have write permissions to anything else in the system*. The other option is to use a program like SELinux to only allow certain operations to the program, but that seems like overkill.
*: You must always give the user read permissions. How will your program run without read access to glibc?
You might want to look into a restricted shell; I think most of the common shells have options for a restricted mode that disables cd, prevents changes to certain environment variables, and some other things. For pdksh, it would be /bin/ksh -r. The option differ for other shells, though, so read the appropriate manual page.

What corner cases must we consider when parsing $PATH on Linux?

I'm working on a C application that has to walk $PATH to find full pathnames for binaries, and the only allowed dependency is glibc (i.e. no calling external programs like which). In the normal case, this just entails splitting getenv("PATH") by colons and checking each directory one by one, but I want to be sure I cover all of the possible corner cases. What gotchas should I look out for? In particular, are relative paths, paths starting with ~ meant to be expanded to $HOME, or paths containing the : char allowed?
One thing that once surprised me is that the empty string in PATH means the current directory. Two adjacent colons or a colon at the end or beginning of PATH means the current directory is included. This is documented in man bash for instance.
It also is in the POSIX specification.
So
PATH=:/bin
PATH=/bin:
PATH=/bin::/usr/bin
All mean the current directory is in PATH
I'm not sure this is a problem with Linux in general, but make sure that your code works if PATH has some funky (like, UTF-8) encoding to deal with directories with fancy letters. I suspect this might depend on the filesystem encoding.
I remember working on a bug report of some russian guy who had fancy letters in his user name (and hence, his home directory name which appeared in PATH).
This is minor but I'll added it since it hasn't already been mentioned. $PATH can include both absolute and relative paths. If your crawling the paths list by chdir(2)ing into each directory, you need to keep track of the original working directory (getcwd(3)) and chdir(2) back to it at each iteration of the crawl.
The existing answers cover most of it, but it's worth covering parts of the question that wasn't answered yet:
$ and ~ are not special in the value of $PATH.
If $PATH is not set at all, execvp() will use a default value.

Resources