What is the difference between open with O_EXCL and using flock - c

Imagine I have two processes trying to open the same file in read only mode. I want only one process to open the file and the other one to fail (essentialy a file lock). What is the difference between doing
if (open("name", O_RDONLY | O_EXCL | O_NONBLOCK, 0666) == -1 && EWOULDBLOCK == errno) {
...
}
and
int fd = open("name", O_RDONLY | O_EXCL, 0666);
if (flock(fd, LOCK_EX | LOCK_NB, 0666) && EWOULDBLOCK == errno) {
...
}
Is it the same? Does the above work at all as I expect? If not then what does O_EXCL do? Is there a way to make the above one work as I want?

As the question What is the written-out word for O_EXCL cross-referenced by Erdal Küçük in a comment implies (if not states), the O_EXCL flag is only of relevance if you are creating a file. Your other options to open() do not create a file, so the use of O_EXCL is irrelevant (harmless but not useful). Even if you were creating the file, it would not stop another program from opening the file for reading after the file is created but while other processes have the file open. Note that if the other program tries to create the file with O_EXCL, then that other program will fail to create the file — that's what O_EXCL is for.
You will need to use one of the advisory locking mechanisms — flock(2) (found on Linux and BSD systems (and probably others) but not standardized by POSIX), or POSIX functions lockf() or
fcntl().
All the programs accessing the file will need to agree on using the advisory locking. If someone runs cat name, the cat command will not pay any attention to the advisory locks on the file name.
Some systems support mandatory file locking. When supported, this is activated by setting the SGID bit on the file while the group-execute bit is cleared. If the system supports mandatory file locking, it is likely the ls -l would list an l in the group-execute bit column; otherwise, it will likely report S. So the permissions might show as either of these:
-rw-r-lr--
-rw-r-Sr--
The file locking functions should not be confused with flockfile() which is for locking a single file stream in a multi-threaded program, not for locking files.

Related

Overwrite a file instead of appending to it?

I'm a newbie linux programmer that ordered to be a sitin for my colleague.
I opened up a file with this line:
err = open("path/foo.txt", O_RDWR | O_CREAT,0777);
And now I write to it, and hexdump the file, the output showed that I appended
new content instead of overwriting the original.
How do I overwrite?
Also if you ask me "wth is "open"?" I'd refer to the newbie defense and say I don't know. The closest thing I know is fopen but I don't know what library/framework my colleague is using. Posix perhaps?
If you open the manual for open(2) via man 2 open, you will find a list of accepted flags.
You should be reading man pages in any case and it's not hard, the only things you need to remember is that you can search for keywords by pressing /, n/N for jumping between things that were found, and q to exit when you're done, everything else is superfluous and may only speed things up a little.
The flag you're looking for is
O_TRUNC
If the file already exists and is a regular file and the access mode allows
writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0.
If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored.
Otherwise, the effect of O_TRUNC is unspecified.
So all you have to do is add this flag whenever you want to overwrite the file completely.
But if you can, you should be using fopen(3), in which case the mode you're looking for is w+, and it is equivalent to O_RDWR | O_CREAT | O_TRUNC in open(2).
open(2) is a low level system call and reading from a raw file descriptor efficiently and correctly is not that simple, FILE* you get from fread(3) on the other hand is an implementation that handles most of details like buffering for you, and unless you can do better, you shouldn't be avoiding it.
By the way, 0777 file permission is rarely needed and doesn't apply to regular files, I'd recommend you open normal text files with 0666, which does everything you need except does not enable it to be executed as a program.

Trying to implement append in my own shell Linux

I'm trying to implement append command in my own shell.
I succeeded to append to existing file but whenever I'm trying to append to file doesn't exist it makes a file without any permission (not read and not write)
if (append) {
fd = open(outfile,'a');
lseek(fd,0,SEEK_END);
close (STDOUT_FILENO) ;
dup(fd);
close(fd);
/* stdout is now appended */
}
What should I do to make a file with permissions
?
The open() system call doesn't use a character constant to indicate 'append'. Read the POSIX specification for open() — and look at O_APPEND etc. You need more flags than just O_APPEND, and you need three arguments to open() if you want to create the file if it doesn't exist (O_CREAT), etc.
if (append)
{
fd = open(outfile, O_CREAT|O_APPEND|O_WRONLY, 0644);
if (fd < 0)
…deal with error…
}
You can write 0644 as S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH but the octal is shorter and (after 30+ years practice) a lot easier to read. You can add write permissions for group (S_IWGRP) and others (S_IWOTH) if you like (0666 in octal), but unless you know you want group members and others to modify the files, it is safer to omit those — for all it goes against historical precedent. Users can and should set the shell umask value to 022 to prevent group and others from being able to write to files by default, but there's no harm (IMO) in being secure without that.

Is there a way on a POSIX system to atomically create a directory if it doesn't exist?

Is there any way on a POSIX system to atomically create a directory only if it doesn't already exist?
Similar to
int fd = open( "/path/to/file", O_CREAT | O_EXCL | O_RDWR, 0644 );
This doesn't work:
int dfd = open( "/path/to/dir", O_DIRECTORY | O_CREAT | O_EXCL | O_RDWR, 0755 );
fails on my Solaris 11 and Ubuntu 20.04 systems with errno set to EINVAL on Solaris and ENOTDIR on Ubuntu.
The POSIX open() documentation states this for O_CREAT:
If the file exists, this flag has no effect except as noted under O_EXCL below. Otherwise, if O_DIRECTORY is not set ...
Well, it's not a file, and O_DIRECTORY is set.
(Inspired by the question Race condition stat and mkdir - there doesn't appear to be any way in POSIX to atomically create a directory if it doesn't already exist.)
To answer the question in your title, mkdir does this -- there's no need for extra flags as mkdir will always "atomically" create a directory if and only if it does not exist (and the path is not a file).
From comments, it seems that you actually want to atomically create and open a directory, but it seems like this is an XY problem. Why, as you cannot open a directory for write in any case? If you first create and then open the directory (non-atomically) then there is no difference in behavior (and no race condition) as if in the interim, someone removed the directory, the open will fail.
If you're worried about only creating files in a directory with permissions set such that noone (else) can read them, you can check the permissions and ownership of the directory (with fstat) after opening it.

open() function parameters

If you look at this code block below by taking into consideration the last parameter "0", Does write line work properly ?
filename = argv[1];
string = "Example string";
if (stat(argv[1], &buf) != 0)
{
fd = open(filename, O_WRONLY | O_CREAT, 0);
if (fd < 0)
{
perror(filename);
exit(1);
}
write(fd, string, strlen(string));
close(fd);
}
else
{
print("%s file exists\n", filename);
}
From the manpage:
mode specifies the permissions to use in case a new file is created. This argument must be supplied when O_CREAT is specified in flags; if O_CREAT is not specified, then mode is ignored. The effective permissions are modified by the process's umask in the usual way: The permissions of the created file are (mode & ~umask). Note that this mode applies only to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor.
The following symbolic constants are provided for mode:
S_IRWXU 00700 user (file owner) has read, write and execute permission
S_IRUSR 00400 user has read permission
S_IWUSR 00200 user has write permission
S_IXUSR 00100 user has execute permission
S_IRWXG 00070 group has read, write and execute permission
S_IRGRP 00040 group has read permission
S_IWGRP 00020 group has write permission
S_IXGRP 00010 group has execute permission
S_IRWXO 00007 others have read, write and execute permission
S_IROTH 00004 others have read permission
S_IWOTH 00002 others have write permission
S_IXOTH 00001 others have execute permission
So, specifying a mode of zero, you will create a file with the permissions of 0 & ~umask, i.e. a file without any permissions.
What exactly the filesystem makes of this is not in the domain of the open() or write() functions.
It is valid,
This is from open(2) Linux manual pages
The mode argument specifies the file mode bits be applied when a new file is created. This argument must be supplied when O_CREAT or O_TMPFILE is specified in flags; if neither O_CREAT nor O_TMPFILE is specified, then mode is ignored. The effective mode is modified by the process's umask in the usual way: in the absence of a default ACL, the mode of the created file is (mode & ~umask). Note that this mode applies only to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor.
In theory then, your access to the file will be valid until you call close() as I understand the part I highlighted in the above excerpt.
Interesting question. POSIX says:
The argument following the oflag argument does not affect whether the file is open for reading, writing, or for both.
Which means that since you're handling the error return from open, if you reach the write line the behavior is well defined.
To expand a bit why this works. On most filesystems on unix-like systems, the meta-data related to a file should not affect already open file descriptors. You can for example remove a file that you have opened. This is in fact done quite commonly with temporary files, so that you don't need to remember to delete them on exit. The same applies to permissions or even ownership of the file. In fact, you can chroot while holding a file open and you can still write to it without actually being able to see it. You can even use file descriptor passing to give an open file descriptor to another process that wouldn't be allowed to open that file. This is quite commonly used for privilege separation. The permissions you had when creating a file descriptor are valid regardless of the changes to permissions later. So your question is a very interesting edge case because it asks if the filesystem permissions of the file are set before or after we create a file descriptor for it and POSIX seems to be clear on that.
I can only think of two exceptions to that right now. First is when someone forcibly remounts a filesystem to read-only in that case the kernel will go through horrifying gymnastics to invalidate your file descriptor which will make all its operations fail. Second one is AFS where your permissions are actually checked when you close the file (or, when the last user of the file on your local system closes it which sends it to the server), which leads to hilarious problems where your time-limited access tokens were valid when you opened a file but aren't valid any longer when you close it. This is also why close returns errors (but that's another rant).
This is why I mentioned error handling above. Even though POSIX says that it should not have an effect, I could see AFS or certain other file systems refusing to open such a file.

flock(): removing locked file without race condition?

I'm using flock() for inter-process named mutexes (i.e. some process can decide to hold a lock on "some_name", which is implemented by locking a file named "some_name" in a temp directory:
lockfile = "/tmp/some_name.lock";
fd = open(lockfile, O_CREAT);
flock(fd, LOCK_EX);
do_something();
unlink(lockfile);
flock(fd, LOCK_UN);
The lock file should be removed at some point, to avoid filling the temp directory with hundreds of files.
However, there is an obvious race condition in this code; example with processes A, B and C:
A opens file
A locks file
B opens file
A unlinks file
A unlocks file
B locks file (B holds a lock on the deleted file)
C opens file (a new file one is created)
C locks file (two processes hold the same named mutex !)
Is there a way to remove the lock file at some point without introducing this race condition ?
Sorry if I reply to a dead question:
After locking the file, open another copy of it, fstat both copies and check the inode number, like this:
lockfile = "/tmp/some_name.lock";
while(1) {
fd = open(lockfile, O_CREAT);
flock(fd, LOCK_EX);
fstat(fd, &st0);
stat(lockfile, &st1);
if(st0.st_ino == st1.st_ino) break;
close(fd);
}
do_something();
unlink(lockfile);
flock(fd, LOCK_UN);
This prevents the race condition, because if a program holds a lock on a file that is still on the file system, every other program that has a leftover file will have a wrong inode number.
I actually proved it in the state-machine model, using the following properties:
If P_i has a descriptor locked on the filesystem then no other process is in the critical section.
If P_i is after the stat with the right inode or in the critical section it has the descriptor locked on the filesystem.
In Unix it is possible to delete a file while it is opened - the inode will be kept until all processes have ended that have it in their file descriptor list
In Unix it is possible to check that a file has been removed from all directories by checking the link count as it becomes zero
So instead of comparing the ino-value of the old/new file paths you can simply check the nlink count on the file that is already open. It assumes that it is just an ephemeral lock file and not a real mutex resource or device.
lockfile = "/tmp/some_name.lock";
for(int attempt; attempt < timeout; ++attempt) {
int fd = open(lockfile, O_CREAT, 0444);
int done = flock(fd, LOCK_EX | LOCK_NB);
if (done != 0) {
close(fd);
sleep(1); // lock held by another proc
continue;
}
struct stat st0;
fstat(fd, &st0);
if(st0.st_nlink == 0) {
close(fd); // lockfile deleted, create a new one
continue;
}
do_something();
unlink(lockfile); // nlink :=0 before releasing the lock
flock(fd, LOCK_UN);
close(fd); // release the ino if no other proc
return true;
}
return false;
If you use these files for locking only, and do not actually write to them, then I suggest you treat the existence of the directory entry itself as an indication for a held lock, and avoid using flock altogether.
To do so, you need to construct an operation which creates a directory entry and reports an error if it already existed. On Linux and with most file systems, passing O_EXCL to open will work for this. But some platforms and some file systems (older NFS in particular) do not support this. The man page for open therefore suggests an alternative:
Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same file system (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
So this looks like a locking scheme which is officially documented and therefore indicates a certain level of support and best practice suggestion. But I have seen other approaches as well. bzr for example uses directories instead of symlinks in most places. Quoting from its source code:
A lock is represented on disk by a directory of a particular name,
containing an information file. Taking a lock is done by renaming a
temporary directory into place. We use temporary directories because
for all known transports and filesystems we believe that exactly one
attempt to claim the lock will succeed and the others will fail. (Files
won't do because some filesystems or transports only have
rename-and-overwrite, making it hard to tell who won.)
One downside to the above approaches is that they won't block: a failed locking attempt will result in an error, but not wait till the lock becomes available. You will have to poll for the lock, which might be problematic in the light of lock contention. In that case, you might want to further depart from your filesystem-based approach, and use third party implementations instead. But general questions on how to do ipc mutexes have already been asked, so I suggest you search for [ipc] [mutex] and have a look at the results, this one in particular. By the way, these tags might be useful for your post as well.

Resources