safely reading directory contents - c

Is it safe to read directory entries via readdir() or scandir() while files are being created or deleted in this directory? Should I prefer one over the other?
EDIT: When I say "safe" I mean entries returned by these functions are valid and can be operated without crashing the program.
Thanks.

It depends by what you mean as "safe". They are safe in the sense that they should not crash your program. However, if you are creating/deleting files as you are reading/scanning that directory, the set of files you get back might not be up-to-date.
When reading/scanning a directory for directory entries, the file pointer (a directory is just a special type of file), moves forward. However, depending upon the file system, there may be nothing to prevent new files from being created in an empty directory entry slot behind your file pointer. Consequently, newly added directory entries may not be immediately detected by readdir()/scandir(). Similar reasoning applies for file deletion / directory entry removal.
Hope this helps.

What's your definition of safety? You won't crash the system, and readdir/scandir won't crash your program. Although they might give you data that is immediately out of date.
The usual semantics for reading a directory are that if you read the directory from beginning to end, you will see all of the files that didn't change during that time exactly once, and you will see files that were created or deleted during that time at most once.
On UNIX-like systems readdir() and scandir() are library functions implemented on top of the same underlying system call (getdents() in Linux, getdirentries() in BSD). So there shouldn't be much difference in their behavior in this regard. I think readdir() is a bit more standard, and therefore will be more portable.

Related

Trouble understanding file scope in C

I'm having trouble wrapping my head around files in C, specifically scope and duration. Say I create a file using
fopen("random.dat", "w");
How long does this file exist for? Does it get deleted once my program is finished running, or is it somehow reset? If I reopen the file further down in my code, only this time with the "r" reading argument, or "a", will I have conflicting streams since I'm opening a file that is already technically opened?
It's a normal file, just like all the other files on your computer. It exists until something deletes it, and its contents stay the same until something modifies it. It's not automatically deleted or "reset" when the program finishes. (C would be useless as a programming language if it couldn't save data to files that last longer than the program.)
However, since you're opening the file with the "w" option, the file will be truncated (reset to zero length) if it already exists — effectively, fopen deletes the existing file and creates a new empty one. That means that if you run your program a second time, the output from the first run will be replaced with the output from the second.
The effect of opening the same file more than once at the same time is platform-specific. On Unix/Linux it should work fine, but on Windows it may fail (though I haven't checked). But if you close the file (e.g. with fclose) before opening it again, that should work properly on any system.
The term file scope is used during compilation of a C program. It has nothing to do with something during execution.
Actually, the term is missleading; a better phrase would be compilation unit scope. It describes the visibility of names (variables, functions, structs, ... ) defined outside of a block (statement), i.e. at the outermost level.
For files opened during program execution, they are open actually until closed explicitly, independent from the program structure. However, as you required an object holding a reference to the file, that does restrict visibility to where you have access to this reference (FILE * for the stdlib file-functions), either by scope, or by explicitly passing it to functions.
A normal file opened/written/closed will dwefinitively not stop existing after the program closes or its reference goes out of scope (how could you store data persistently?), but only if explicitly deleted/unlinked or the filesystem itself is deleted (e.g. for Linux tempfs, which only exists until the OS is shut down). This is called lifetime, btw.

Is the remove function guaranteed to delete the file?

The wording of the C99 standard seems a bit ambiguous regarding the behavior of the remove function.
In section 7.19.4.1 paragraph 2:
The remove function causes the file whose name is the string pointed to by filename
to be no longer accessible by that name. A subsequent attempt to open that file using that
name will fail, unless it is created anew.
Does the C99 standard guarantee that the remove function will delete the file on the filesystem, or could an implementation simply ignore the file -- leaving the file on filesystem, but just inaccessible to the current program via that filename-- for the remainder of the program?
I don't think you're guaranteed anything by the C standard, which says (N1570, 7.21.4.1 2):
The remove function causes the file whose name is the string pointed to by filename
to be no longer accessible by that name. A subsequent attempt to open that file using that
name will fail, unless it is created anew. If the file is open, the behavior of the remove
function is implementation-defined.
So, if you had a pathological implementation, it could be interpreted, I suppose, to mean that calling remove() merely has the effect of making the file invisible to this running instance of this program, but that would be, as I said, pathological.
However, all is not utterly stupid! The POSIX specification for remove() says,
If path does not name a directory, remove(path) shall be equivalent to unlink(path).
If path names a directory, remove(path) shall be equivalent to rmdir(path).
And the POSIX documentation for unlink() is pretty clear:
The unlink() function shall remove a link to a file.
Therefore, unless your implementation (a) Does not conform to POSIX requirements, and (b) is extremely pathological, you can be assured that the remove() function will actually try to delete the file, and will return 0 only if the file is actually deleted.
Of course, on most filesystems currently in use, filenames are decoupled from the actual files, so if you've got five links to an inode, that file's going to keep existing until you delete all five of them.
References:
The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition
The Open Group Base Specifications Issue 7, IEEE Std 1003.1™, 2013 EditionNote:"IEEE Std 1003.1 2004 Edition" is "IEEE Std 1003.1-2001 with corrigenda incorporated". "IEEE Std 1003.1 2013 Edition" is "IEEE Std 1003.1-2008 with corrigendum incorporated".
The C99 standard does not guarantee anything.
The file could remain there for any of the reasons unlink(2) can fail. For example you don't have permission to do this.
Consult http://linux.die.net/man/2/unlink for examples what can all go wrong.
On Unix / Linux, there are several reasons for the file not to be removed:
You dont't have write permission on the file's directory (in that case, remove() will return ERROR, of course)
there is another hard link on the file. Then the file will remain on disk but only be accessible by the other path name(s)
the file is kept open by any process. In that case the directory entry is removed immediatly, so that no subsequent open() can access the file (or an appropriate call will create a new file), but the file itself will remain on disk as long as any process keeps it open.
Typically, that only unlinks the file from the file system. This means all the data that was in the file, is still there. Given enough experience or time, someone would be able to get that data back.
There are some options to not have the file be read again, ever. The *nix utility shred will do that. If you are looking to do it from within a program, open the file to write, and write nonsense data over what you are looking to 'remove'.

Moving a file on Linux in C

Platform: Debian Wheezy 3.2.0-4-686-pae
Complier: GCC (Debian 4.7.2-5) 4.7.2 (Code::Blocks)
I want to move a file from one location to another. Nothing complex like moving to different drives or to different file systems. I know the "standard" way to do this would be simply copying the file and then removing the original. But I want some way of preserving the file's ownership, mode, last access/modification, etc. . I am assuming that I will have to copy the file and then edit the new file's ownership, mode, etc. afterwards but I have no idea how to do this.
The usual way to move a file in C is to use rename(2), which sometimes fail.
If you cannot use the rename(2) syscall (e.g. because source and target are on different filesystems), you have to query the size, permission and other metadata of the source file with stat(2); copy the data looping on read(2), write(2) (using a buffer of several kilobytes), open(2), close(2) and the metadata using chmod(2), chown(2), utime(2). You might also care about copying attributes using getxattr(2), setxattr(2), listxattr(2). You could also in some cases use sendfile(2), as commented by David C. Rankin.
And if the source and target are on different filesystems, there is no way to make the move atomic and avoid race conditions (So using rename(2) is preferable when possible, because it is atomic according to its man page). The source file can always be modified (by another process) during the move operations...
So a practical way to move files is to first try doing a rename(2), and if that fails with EXDEV (when oldpath and newpath are not on the same mounted filesystem), then you need to copy bytes and metadata. Several libraries provide functions doing that, e.g. Qt QFile::rename.
Read Advanced Linux Programming - and see syscalls(2) - for more (and also try to strace some mv command to understand what it is doing). That book is freely and legally downloadable (so you could find several copies on the Web).
The /bin/mv command (see mv(1)) is part of GNU coreutils which is free software. You could either study its source code, or use strace(1) to understand what that command does (in terms of syscalls(2)). In some open source Unix shells like sash or busybox, mv might be a shell builtin. See also path_resolution(7) and glob(7).
There are subtle corner cases (imagine another process or pthread doing some file operations on the same filesystem, directory, or files). Read some operating system textbook for more.
Using a mix of snprintf(3), system(3), mv(1) could be tricky if the file name contains weird characters such as tab or or newlines, or starts with an initial -. See errno(3).
If the original and new location for the file are on the same filesystem then a "move" is conceptually identical to a "rename."
#include <stdio.h>
int rename (const char *oldname, const char *newname)

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Copy directory recursively in pure C on Linux/UNIX

Can someone guide me on a possible solution? I don't want to use /bin/cp or any other foreign apps. I want my program to be independent. Also I know that every system is quite specific, so I'm interested in UNIX/Linux compatibility.
How can I solve it? Just going down the source directory and creating a new directories in the target one and copying files in them, or there is a better solution?
BTW my goal is: copy all first level subdirs recursively into target dir if they are not present there
You really need some kind of recursive descent into the directory tree. Doing this, you can actually make this very portable (using opendir/readdir on Linux and FindFirstFile/FindNextFile on Windows). The problem that remains is the actual copying. You can use the C standard library for that with the following algorithm:
Open source file
Open target file
In a loop, fread a block of constant size from the source, then fwrite it to the target. Stop if the source file contains no more data
Hope this helps :)
Use the POSIX nftw(3) function to walk the tree you want to copy. You supply this function with a callback function that gets called on the path of each file/directory. Define a callback that copies the file/dir it gets called on into the destination tree. The fourth callback argument of type struct FTW * can be used to compute the relative path.
If you want to use only C, you could use dirent.h. Using this, you can recursively follow the directory structure. Then you could open the files in the binary mode, and write them to the desired location via write stream.

Resources