Is the remove function guaranteed to delete the file? - c

The wording of the C99 standard seems a bit ambiguous regarding the behavior of the remove function.
In section 7.19.4.1 paragraph 2:
The remove function causes the file whose name is the string pointed to by filename
to be no longer accessible by that name. A subsequent attempt to open that file using that
name will fail, unless it is created anew.
Does the C99 standard guarantee that the remove function will delete the file on the filesystem, or could an implementation simply ignore the file -- leaving the file on filesystem, but just inaccessible to the current program via that filename-- for the remainder of the program?

I don't think you're guaranteed anything by the C standard, which says (N1570, 7.21.4.1 2):
The remove function causes the file whose name is the string pointed to by filename
to be no longer accessible by that name. A subsequent attempt to open that file using that
name will fail, unless it is created anew. If the file is open, the behavior of the remove
function is implementation-defined.
So, if you had a pathological implementation, it could be interpreted, I suppose, to mean that calling remove() merely has the effect of making the file invisible to this running instance of this program, but that would be, as I said, pathological.
However, all is not utterly stupid! The POSIX specification for remove() says,
If path does not name a directory, remove(path) shall be equivalent to unlink(path).
If path names a directory, remove(path) shall be equivalent to rmdir(path).
And the POSIX documentation for unlink() is pretty clear:
The unlink() function shall remove a link to a file.
Therefore, unless your implementation (a) Does not conform to POSIX requirements, and (b) is extremely pathological, you can be assured that the remove() function will actually try to delete the file, and will return 0 only if the file is actually deleted.
Of course, on most filesystems currently in use, filenames are decoupled from the actual files, so if you've got five links to an inode, that file's going to keep existing until you delete all five of them.
References:
The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition
The Open Group Base Specifications Issue 7, IEEE Std 1003.1™, 2013 EditionNote:"IEEE Std 1003.1 2004 Edition" is "IEEE Std 1003.1-2001 with corrigenda incorporated". "IEEE Std 1003.1 2013 Edition" is "IEEE Std 1003.1-2008 with corrigendum incorporated".

The C99 standard does not guarantee anything.
The file could remain there for any of the reasons unlink(2) can fail. For example you don't have permission to do this.
Consult http://linux.die.net/man/2/unlink for examples what can all go wrong.

On Unix / Linux, there are several reasons for the file not to be removed:
You dont't have write permission on the file's directory (in that case, remove() will return ERROR, of course)
there is another hard link on the file. Then the file will remain on disk but only be accessible by the other path name(s)
the file is kept open by any process. In that case the directory entry is removed immediatly, so that no subsequent open() can access the file (or an appropriate call will create a new file), but the file itself will remain on disk as long as any process keeps it open.

Typically, that only unlinks the file from the file system. This means all the data that was in the file, is still there. Given enough experience or time, someone would be able to get that data back.
There are some options to not have the file be read again, ever. The *nix utility shred will do that. If you are looking to do it from within a program, open the file to write, and write nonsense data over what you are looking to 'remove'.

Related

What is unsafe about fopen?

When using fopen(), Microsoft Visual Studio prints
warning C4996: 'fopen' was declared deprecated`
As reason is given:
This function or variable may be unsafe. Consider using fopen_s instead.
What is unsafe with fopen() that's more safe with fopen_s()?
How can fopen() be used in a safe way (if possible)?
I don't want to know how to suppress the warning - there are enough Stack Overflow articles that answer that question.
The Microsoft CRT implements the secure library enhancements described in C11 Annex K. Which is normative but not mandatory. fopen_s() is described in section K.3.5.2.1. Also covered by rule FIO06-C of the CERT institute.
At issue is that fopen() dates from simpler times when programmers could still assume that their program was the only one manipulating files. An assumption that has never really been true. It does not have a way to describe how access to the file by other processes is limited, CRT implementations traditionally opened the file without denying any access. Non-standard alternatives have been used to fix this problem, like _fsopen().
This has consequences if the file is opened for writing, another process can also open the file for writing and the file content will be hopelessly corrupted. If the file is opened for reading while another process is writing to it then the view of the file content is unpredictable.
fopen_s() solves these problems by denying all access if the file is opened for writing and only allowing read access when the file is opened for reading.

linux function openat vs open, what does "at" mean?

I know how to use the two functions, but I do not know what the suffix "at" means. Does it represent the abbreviation of "another"?
At means that the working directory considered for the open call is at the given file descriptor, passed as the parameter. The *at() family of functions are useful so that all path relative operations refer to the same file inode, even if it changes name or someone replaces it.

Trouble understanding file scope in C

I'm having trouble wrapping my head around files in C, specifically scope and duration. Say I create a file using
fopen("random.dat", "w");
How long does this file exist for? Does it get deleted once my program is finished running, or is it somehow reset? If I reopen the file further down in my code, only this time with the "r" reading argument, or "a", will I have conflicting streams since I'm opening a file that is already technically opened?
It's a normal file, just like all the other files on your computer. It exists until something deletes it, and its contents stay the same until something modifies it. It's not automatically deleted or "reset" when the program finishes. (C would be useless as a programming language if it couldn't save data to files that last longer than the program.)
However, since you're opening the file with the "w" option, the file will be truncated (reset to zero length) if it already exists — effectively, fopen deletes the existing file and creates a new empty one. That means that if you run your program a second time, the output from the first run will be replaced with the output from the second.
The effect of opening the same file more than once at the same time is platform-specific. On Unix/Linux it should work fine, but on Windows it may fail (though I haven't checked). But if you close the file (e.g. with fclose) before opening it again, that should work properly on any system.
The term file scope is used during compilation of a C program. It has nothing to do with something during execution.
Actually, the term is missleading; a better phrase would be compilation unit scope. It describes the visibility of names (variables, functions, structs, ... ) defined outside of a block (statement), i.e. at the outermost level.
For files opened during program execution, they are open actually until closed explicitly, independent from the program structure. However, as you required an object holding a reference to the file, that does restrict visibility to where you have access to this reference (FILE * for the stdlib file-functions), either by scope, or by explicitly passing it to functions.
A normal file opened/written/closed will dwefinitively not stop existing after the program closes or its reference goes out of scope (how could you store data persistently?), but only if explicitly deleted/unlinked or the filesystem itself is deleted (e.g. for Linux tempfs, which only exists until the OS is shut down). This is called lifetime, btw.

Moving a file on Linux in C

Platform: Debian Wheezy 3.2.0-4-686-pae
Complier: GCC (Debian 4.7.2-5) 4.7.2 (Code::Blocks)
I want to move a file from one location to another. Nothing complex like moving to different drives or to different file systems. I know the "standard" way to do this would be simply copying the file and then removing the original. But I want some way of preserving the file's ownership, mode, last access/modification, etc. . I am assuming that I will have to copy the file and then edit the new file's ownership, mode, etc. afterwards but I have no idea how to do this.
The usual way to move a file in C is to use rename(2), which sometimes fail.
If you cannot use the rename(2) syscall (e.g. because source and target are on different filesystems), you have to query the size, permission and other metadata of the source file with stat(2); copy the data looping on read(2), write(2) (using a buffer of several kilobytes), open(2), close(2) and the metadata using chmod(2), chown(2), utime(2). You might also care about copying attributes using getxattr(2), setxattr(2), listxattr(2). You could also in some cases use sendfile(2), as commented by David C. Rankin.
And if the source and target are on different filesystems, there is no way to make the move atomic and avoid race conditions (So using rename(2) is preferable when possible, because it is atomic according to its man page). The source file can always be modified (by another process) during the move operations...
So a practical way to move files is to first try doing a rename(2), and if that fails with EXDEV (when oldpath and newpath are not on the same mounted filesystem), then you need to copy bytes and metadata. Several libraries provide functions doing that, e.g. Qt QFile::rename.
Read Advanced Linux Programming - and see syscalls(2) - for more (and also try to strace some mv command to understand what it is doing). That book is freely and legally downloadable (so you could find several copies on the Web).
The /bin/mv command (see mv(1)) is part of GNU coreutils which is free software. You could either study its source code, or use strace(1) to understand what that command does (in terms of syscalls(2)). In some open source Unix shells like sash or busybox, mv might be a shell builtin. See also path_resolution(7) and glob(7).
There are subtle corner cases (imagine another process or pthread doing some file operations on the same filesystem, directory, or files). Read some operating system textbook for more.
Using a mix of snprintf(3), system(3), mv(1) could be tricky if the file name contains weird characters such as tab or or newlines, or starts with an initial -. See errno(3).
If the original and new location for the file are on the same filesystem then a "move" is conceptually identical to a "rename."
#include <stdio.h>
int rename (const char *oldname, const char *newname)

safely reading directory contents

Is it safe to read directory entries via readdir() or scandir() while files are being created or deleted in this directory? Should I prefer one over the other?
EDIT: When I say "safe" I mean entries returned by these functions are valid and can be operated without crashing the program.
Thanks.
It depends by what you mean as "safe". They are safe in the sense that they should not crash your program. However, if you are creating/deleting files as you are reading/scanning that directory, the set of files you get back might not be up-to-date.
When reading/scanning a directory for directory entries, the file pointer (a directory is just a special type of file), moves forward. However, depending upon the file system, there may be nothing to prevent new files from being created in an empty directory entry slot behind your file pointer. Consequently, newly added directory entries may not be immediately detected by readdir()/scandir(). Similar reasoning applies for file deletion / directory entry removal.
Hope this helps.
What's your definition of safety? You won't crash the system, and readdir/scandir won't crash your program. Although they might give you data that is immediately out of date.
The usual semantics for reading a directory are that if you read the directory from beginning to end, you will see all of the files that didn't change during that time exactly once, and you will see files that were created or deleted during that time at most once.
On UNIX-like systems readdir() and scandir() are library functions implemented on top of the same underlying system call (getdents() in Linux, getdirentries() in BSD). So there shouldn't be much difference in their behavior in this regard. I think readdir() is a bit more standard, and therefore will be more portable.

Resources