Is it possible to create an unlinked file on a selected filesystem? - c

Basically, the same result as creating a temporary file in the desired file system, opening it, and then unlinking it.
Even better, though unlikely, if this could be done without creating an inode that is visible to other processes.

The ability to do so is OS-specific, since the relevant POSIX function calls all result in a link being generated. Linux in particular has allowed, since version 3.11, the use of O_TMPFILE in the flags argument of open(2) in order to create an anonymous file in a given directory.

There are several POSIX APIs at your disposal:
mkstemp - generates a unique temporary filename from
template, creates and opens the file, and returns an open file
descriptor for the file.
tmpfile - opens a unique temporary file in binary
read/write (w+b) mode. The file will be automatically deleted when
it is closed or the program terminates.
Both of these functions do create files on the filesystem. Creating an inode is unavoidable, if you want to use a real file.
The first provides you a file descriptor for making low-level system calls, like read and write. The second gives you a FILE* for all of the <stdio.h> APIs.
If you don't need/desire an actual file on disk, you should consider the memory stream APIs provided by POSIX.1-2008.
open_memstream() - opens a stream for writing to a buffer.
The buffer is dynamically allocated (as with malloc(3)), and
automatically grows as required.

libtmpfilefd : create a temporary unnamed file seem to fullfill your requirements
Looking at the source file this function create a temporary file with mkstemp then unlink the file right after

Related

mkstemp() - is it safe to close descriptor and reopen it again?

When generating a temporary file name using mkstemp(), is it safe to immediately call close() on the file descriptor returned by mkstemp(), store the file name generated by mkstemp() somewhere and use it (at a much later time) to open the file again for writing a temporary file? Or will this temporary file name become available again as soon as I call close() on it?
The reason why I'm asking is that I'm wondering why mkstemp() returns a file descriptor at all. If it is safe to close() the descriptor immediately, why does it return a descriptor at all? mkstemp() could close it then on its own and just give me a file name.
No. In between the time when you use mkstemp() to create the file and the time when you reopen it, your adversary may have removed the file you created and put a symlink in its place pointing to somewhere else altogether. This is a TOCTOU — Time of Check, Time of Use — vulnerability which the use of mkstemp() largely avoids, provided you keep the file descriptor open.
Once you close the file descriptor, all bets are off in a sufficiently hostile environment.
Note that even if you keep the file descriptor open, an adversary might remove the file, or rename it, and then create their own file (symlink, directory) in its place. The file descriptor remains valid. You could use stat() to get the name information and the fstat() to get the file descriptor information, and if the two match (st_dev and st_ino fields), then you're probably still OK. If they differ, someone's got at the file — if you rename it, you may be renaming their file rather than the one you created.
While the file originally created by mkstemp() still exists, the name will not be regenerated. In general, successive calls to mkstemp() will create distinct names anyway, but the name is guaranteed to be unique at the moment of creation (see the O_EXCL flag for open()).
And just in case you're wondering, no — there isn't a way to associate a name with a file descriptor (there is no hypothetical int flink(int fd, const char *name) system call). There was a question about that on one of the Stack Exchange sites a while ago, and the answer was definitely negative, with references to the Linux Kernel mailing list and so on. One such question is Is it possible to recreate a file from an opened file descriptor?, but I think there was a more thorough version of the question too.
The mkstemp function specifically uses descriptors instead of filenames to avoid race conditions that are commonly associated with its predecessors such as mktemp. In fact, the "s" in "mkstemp" means "secure", because the race condition can be a source of vulnerability (e.g. if you use the temporary file to store JIT code, for example, and someone guessing/stomping the file before you open it could cause your application to load/run the provided code rather than the code that your program generates).
Once you close the descriptor, nothing prevents another application from writing a file with the same name, so please don't do that. You should retain the descriptor for as long as the temporary file is needed (and close the descriptor once the temporary file is no longer going to be used by your program).

Difference between creating a duplicate file descriptor using dup() and creating a hard link?

I just tried out this program where I use dup to duplicate the file desciptor of an opened file.
I had made a hard link to this same file and I opened the same file to read the contents of the file in the program.
My question is what is the difference?
I understand that dup gives me a run time abstraction to the file and that hard link refers more to the filsystem implementation but I do not understand the need for use of one over the other.
What are the advantages of using one over the other?
Why can't we explicitly refer to the hard link if we want to refer to the same file locations instead of creating a file descriptor and vice versa?
I am using Linux and the standard C library.
Hard links work on i-nodes, dup works on opened file descriptors. These are different animals.
A file is mostly an inode, with directory entries pointing to that inode (so some file can have more than one name thru hard links, other files can have no name at all: a temporary file still opened but unlinked has an i-node refered by an opened file descriptor, but no more any name). I-nodes exist for the duration of the file and are written to disks.
A file descriptor only exist in processes (in kernel memory only, not on disk) so can't be written to disk (you could only write its number, which usually don't make any sense).
A file descriptor knows (inside the kernel) its inode, but also some more state, notably the current offset.
You could have two file descriptors working on the same file (the same inode, perhaps by open-ing two different hardlinked or symlinked paths to it) but having different state (e.g. different file position or offset).
If using dup(2) syscall, the two file descriptors share the same state (just after the dup) in particular share the same file offset or position.
If using link(2) syscall, the two directory entries point to the same inode. They need to be on the same filesystem.
And a symlink(2) syscall creates a new inode (and a new file) which refers to the symbolic name. Read other man pages about path_resolution(7) and symlink(7).
A hard link is just a way to have the same
file in two different directories. It is useful for saving some disk space.
Using fdup lets you have two different file descriptors in your program that point to the same file. It is useful if you want to duplicate some kind of logical object that wraps a file descriptor.
The main difference is that a hard link is persistent and a duplicated file descriptor only lasts as long as the process. Plus the reasons already given.

How do I open a directory at kernel level using the file descriptor for that directory?

I'm working on a project where I must open a directory and read the files/directories inside at kernel level. I'm basically trying to find out how ls is implemented at kernel level.
Right now I've figured out how to get a file descriptor for a directory using sys_open() and the O_DIRECTORY flag, but I don't know how to read the fd that I receive. If anyone has any tips or other suggestions I'd appreciate it. (Keep in mind this has to be done at kernel level).
Edit:For a long story short, For a school project I am implementing file/directory attributes. Where I'm storring the attributes is a hidden folder at the same level of the file with a given attribute. (So a file in Desktop/MyFolder has an attributes folder called Desktop/MyFolder/.filename_attr). Trust me I don't care to mess around in kernel for funsies. But the reason I need to read a dir at kernel level is because it's apart of project specs.
To add to caf's answer mentioning vfs_readdir(), reading and writing to files from within the kernel is is considered unsafe (except for /proc, which acts as an interface to internal data structures in the kernel.)
The reasons are well described in this linuxjournal article, although they also provide a hack to access files. I don't think their method could be easily modified to work for directories. A more correct approach is accessing the kernel's filesystem inode entries, which is what vfs_readdir does.
Inodes are filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems).
Notice that vfs_readdir() expects a file * parameter. To obtain a file structure pointer from a user space file descriptor, you should utilize the kernel's file descriptor table.
The kernel.org files documentation says the following on doing so safely:
To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup.
An example :
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
// Handling of the file structures is special.
// Since the look-up of the fd (fget() / fget_light())
// are lock-free, it is possible that look-up may race with
// the last put() operation on the file structure.
// This is avoided using atomic_long_inc_not_zero() on ->f_count
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
....
return file;
atomic_long_inc_not_zero() detects if refcounts is already zero or
goes to zero during increment. If it does, we fail fget() / fget_light().
Finally, take a look at filldir_t, the second parameter type.
You probably want vfs_readdir() from fs/readdir.c.
In general though kernel code does not read directories, user code does.

int filedes? system calls read and write

Can any of you guys tell me what "int filedes" refers to?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
I've noticed I can put any int in there and it seems to work but I don't know what it's for...
Thanks.
It's a file descriptor. See http://en.wikipedia.org/wiki/File_descriptor. Since it represents an offset to a table lookup of files and pipes, there may be multiple descriptors that could return valid data. 0=stdin and 2=stderr will exist by default, or you can look at the open function to create your own.
The very first sentence of the description says, "the file associated with the open file descriptor, fildes". In other words, it indicates the file you're reading from. If your read function call works no matter what file descriptor you pass it, your program isn't doing what you think it is.
Somewhere inside the kernel, there is a table comprises of file descriptor entries on a per-process base. A file descriptor is a structure which describes the state of the file. What kind of information has a file descriptor? First of all, position from which the next read/write operation can be performed. Then, the access mode of the file, specified by the open system call. And last but not least, a data structure which represent the on-disk information of a file. In *nix, this is an inode structure. Here, the main question to be answered is : Where resides the blocks of the file in the disk. If you have an inode of a file in the memory, you can find quickly, where is the Nth block of the file(which means you don't need to parse the path every time, and scan each directory in the path to resolve the inode).

Open system call

I'm studying for my operating systems midterm and was wondering if I can get some help.
Can someone explain the checks and what the kernel does during the open() system call?
Thanks!
Very roughly, you can think of the following steps:
Translate the file name into an inode, which is the actual file system object describing the contents of the file, by traversing the filesystem data structures.
During this traversal, the kernel will check that you have sufficient access through the directory path to the file, and check access on the file itself. The precise checks depend on what modes were passed to open.
Create what's sometimes called an open file descriptor within the kernel. There is one of these objects for each file the kernel has opened on behalf of any process.
Allocate an unused index in the per-process file descriptor table, and point it at the open file descriptor.
Return this index from the system call as the file descriptor.
This description should be essentially correct for opening plain files and/or directories, but things are different for various sorts of special files, in particular for devices.
I would go back to what the prof told you - there a lot of things that happen during open(), depending on what you're opening (i.e. a device, a file, a directory), and unless you write what the professor's looking for, you'll lose points.
That being said, it mostly involves the checks to see if this open is valid (i.e. does this file exist, does the user have permissions to read/write it, etc), then an entry in the kernel handle table is allocated to keep track of the fd and its current file position (and of course, some other things)

Resources